[jira] [Commented] (HBASE-3404) Compaction Ordering for Bulk Import Files

Hudson (Commented) (JIRA) Thu, 10 Nov 2011 18:57:49 -0800

    [ 
https://issues.apache.org/jira/browse/HBASE-3404?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13148234#comment-13148234
 ]


Hudson commented on HBASE-3404:
-------------------------------

Integrated in HBase-TRUNK #2427 (See 
[https://builds.apache.org/job/HBase-TRUNK/2427/])
    HBASE-3690 Option to Exclude Bulk Import Files from Minor Compaction

Summary:
We ran an incremental scrape with HFileOutputFormat and
encountered major compaction storms. This is caused by the bug in
HBASE-3404. The permanent fix is a little tricky without HBASE-2856. We
realized that a quicker solution for avoiding these compaction storms is
to simply exclude bulk import files from minor compactions and let them
only be handled by time-based major compactions. Add with functionality
along with a config option to enable it.

Rewrote this feature to be done on a per-bulkload basis.

Test Plan:
 - mvn test -Dtest=TestHFileOutputFormat

DiffCamp Revision:

Reviewers: stack, Kannan, JIRA, dhruba

Reviewed By: stack

CC: dhruba, lhofhansl, nspiegelberg, stack

Differential Revision: 357

nspiegelberg : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java

                
> Compaction Ordering for Bulk Import Files
> -----------------------------------------
>
>                 Key: HBASE-3404
>                 URL: https://issues.apache.org/jira/browse/HBASE-3404
>             Project: HBase
>          Issue Type: Bug
>    Affects Versions: 0.90.0, 0.90.1, 0.92.0
>            Reporter: Nicolas Spiegelberg
>            Assignee: Nicolas Spiegelberg
>
> We got into an issue today where we were using HFileOutputFormat to perform 
> an incremental load on an already-large cluster.  Because bulk-loaded files 
> don't have a sequence ID, they are put in the front of the StoreFile list.  
> This resulted in the following StoreFile ordering
> 2GB (bulk) => 25GB => 2GB => ...
> So this triggered a 30+GB major compaction for every single region.  
> Optimally, we would like bulk import files to be ordered in the compaction 
> list at the time of insertion so this can be a much smaller compaction and 
> rely on StoreFile age for major compaction trigger.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3404) Compaction Ordering for Bulk Import Files

Reply via email to