[
https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924664#action_12924664
]
Nicolas Spiegelberg commented on HBASE-2462:
--------------------------------------------
So, we've been talking about a new compaction algorithm internally and wanted
to get external feedback as well...
The existing store file selection algorithm seems to not utilize enough
context. We start at the oldest and compact everything else when it's no
longer 2x the next oldest. It seems like we want to approach from the opposite
direction:
1. Start at the newest file
2. Unconditionally compact as long as the StoreFiles are less than a certain
size (thinking "hbase.regionserver.hlog.blocksize").
3. After that metric has been met, if next oldest file < sum(all newer files)
* R, we include it in the compaction. R = 2.
4. If files-to-compact < max(HColumnDescriptor.maxVersions(),3), skip the
compaction
This algorithm can serve a very generic workload. Axiom: It's worth compacting
if sum(files) >= 150% * max(files). Maybe make this adjustable. The main
point is that the ratio between file[i], file[i+1] is less useful than
sum(files), max(files).
A. With files[i] < files[i+1] * 2, our worst case ends up with a decreasing
triangle of 2x.
B. With files[i] < sum(files[0..i-1]) * 2, we are dealing with the derivative.
Our worst case ends up with decreasing triangle of 4x
With a 4x ratio & 64 MB hlog blocksize, we could support up to a 21.4GB Store
while using less than 8 files. 3 minimal threshold fiels + 5 worst case files
that would be roughly: 64MB, 256MB, 1GB, 4GB, 16GB == 21.3GB. Assuming that
the average user has a 1-2 GB store, the number of HFiles should never get
above 6.
> Review compaction heuristic and move compaction code out so standalone and
> independently testable
> -------------------------------------------------------------------------------------------------
>
> Key: HBASE-2462
> URL: https://issues.apache.org/jira/browse/HBASE-2462
> Project: HBase
> Issue Type: Improvement
> Reporter: stack
> Assignee: Jonathan Gray
> Priority: Critical
>
> Anything that improves our i/o profile makes hbase run smoother. Over in
> HBASE-2457, good work has been done already describing the tension between
> minimizing compactions versus minimizing count of store files. This issue is
> about following on from what has been done in 2457 but also, breaking the
> hard-to-read compaction code out of Store.java out to a standalone class that
> can be the easier tested (and easily analyzed for its performance
> characteristics).
> If possible, in the refactor, we'd allow specification of alternate merge
> sort implementations.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.