[ 
https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924792#action_12924792
 ] 

Nicolas Spiegelberg commented on HBASE-2462:
--------------------------------------------

@stack: 

1. FS default blocksize is the default for a non-custom hlog.blocksize, but 
they are not necessarily 1-1.   The idea is that new HFiles created should 
always be <= hlog.blocksize, so we unconditionally compact for HFiles that have 
not already been compacted at least once.

2.  The idea behind step #4 is that compaction becomes extremely useful when 
you can use it to dedupe.  We should definitely use the compactionThreshold 
metric here instead of hard-coded 3,   However, I don't think this should be an 
absolute number of StoreFiles, but rather the number of relatively-small 
StoreFiles.  If you have huge region sizes (e.g. large object store), then you 
don't mind having 6 storefiles and really just want to compact when it will 
save a decent amount of space.

3. This algorithm will perform roughly the same for compacting small/new files; 
however it will be more aggressive about including older files in the 
compaction because it can more quickly detect when it's advantageous to 
compact.  Because of the 4x (vs. 2x) multiplier, it's 2x more scalable and 
should result in 1/2 the amount of large StoreFiles for large regions.  For 
DEFAULT_MAX_FILE_SIZE == 256MB, you should never have more than 5 StoreFiles 
before triggering a split.

> Review compaction heuristic and move compaction code out so standalone and 
> independently testable
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2462
>                 URL: https://issues.apache.org/jira/browse/HBASE-2462
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Jonathan Gray
>            Priority: Critical
>
> Anything that improves our i/o profile makes hbase run smoother.  Over in 
> HBASE-2457, good work has been done already describing the tension between 
> minimizing compactions versus minimizing count of store files.  This issue is 
> about following on from what has been done in 2457 but also, breaking the 
> hard-to-read compaction code out of Store.java out to a standalone class that 
> can be the easier tested (and easily analyzed for its performance 
> characteristics).
> If possible, in the refactor, we'd allow specification of alternate merge 
> sort implementations. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to