[ 
https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924664#action_12924664
 ] 

Nicolas Spiegelberg commented on HBASE-2462:
--------------------------------------------

So, we've been talking about a new compaction algorithm internally and wanted 
to get external feedback as well...

The existing store file selection algorithm seems to not utilize enough 
context.  We start at the oldest and compact everything else when it's no 
longer 2x the next oldest.  It seems like we want to approach from the opposite 
direction:

1. Start at the newest file
2. Unconditionally compact as long as the StoreFiles are less than a certain
size (thinking "hbase.regionserver.hlog.blocksize").
3. After that metric has been met,  if next oldest file < sum(all newer files) 
* R, we include it in the compaction.  R = 2.
4. If files-to-compact < max(HColumnDescriptor.maxVersions(),3), skip the 
compaction

This algorithm can serve a very generic workload.  Axiom: It's worth compacting 
if sum(files) >= 150% * max(files).  Maybe make this adjustable.  The main 
point is that the ratio between file[i], file[i+1] is less useful than 
sum(files), max(files).

A. With files[i] < files[i+1] * 2, our worst case ends up with a decreasing 
triangle of 2x.
B. With files[i] < sum(files[0..i-1]) * 2, we are dealing with the derivative.  
Our worst case ends up with decreasing triangle of 4x

With a 4x ratio & 64 MB hlog blocksize, we could support up to a 21.4GB Store 
while using less than 8 files.  3 minimal threshold fiels + 5 worst case files 
that would be roughly: 64MB, 256MB, 1GB, 4GB, 16GB == 21.3GB.  Assuming that 
the average user has a 1-2 GB store, the number of HFiles should never get 
above 6.


> Review compaction heuristic and move compaction code out so standalone and 
> independently testable
> -------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-2462
>                 URL: https://issues.apache.org/jira/browse/HBASE-2462
>             Project: HBase
>          Issue Type: Improvement
>            Reporter: stack
>            Assignee: Jonathan Gray
>            Priority: Critical
>
> Anything that improves our i/o profile makes hbase run smoother.  Over in 
> HBASE-2457, good work has been done already describing the tension between 
> minimizing compactions versus minimizing count of store files.  This issue is 
> about following on from what has been done in 2457 but also, breaking the 
> hard-to-read compaction code out of Store.java out to a standalone class that 
> can be the easier tested (and easily analyzed for its performance 
> characteristics).
> If possible, in the refactor, we'd allow specification of alternate merge 
> sort implementations. 

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to