[
https://issues.apache.org/jira/browse/HBASE-2462?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12924792#action_12924792
]
Nicolas Spiegelberg commented on HBASE-2462:
--------------------------------------------
@stack:
1. FS default blocksize is the default for a non-custom hlog.blocksize, but
they are not necessarily 1-1. The idea is that new HFiles created should
always be <= hlog.blocksize, so we unconditionally compact for HFiles that have
not already been compacted at least once.
2. The idea behind step #4 is that compaction becomes extremely useful when
you can use it to dedupe. We should definitely use the compactionThreshold
metric here instead of hard-coded 3, However, I don't think this should be an
absolute number of StoreFiles, but rather the number of relatively-small
StoreFiles. If you have huge region sizes (e.g. large object store), then you
don't mind having 6 storefiles and really just want to compact when it will
save a decent amount of space.
3. This algorithm will perform roughly the same for compacting small/new files;
however it will be more aggressive about including older files in the
compaction because it can more quickly detect when it's advantageous to
compact. Because of the 4x (vs. 2x) multiplier, it's 2x more scalable and
should result in 1/2 the amount of large StoreFiles for large regions. For
DEFAULT_MAX_FILE_SIZE == 256MB, you should never have more than 5 StoreFiles
before triggering a split.
> Review compaction heuristic and move compaction code out so standalone and
> independently testable
> -------------------------------------------------------------------------------------------------
>
> Key: HBASE-2462
> URL: https://issues.apache.org/jira/browse/HBASE-2462
> Project: HBase
> Issue Type: Improvement
> Reporter: stack
> Assignee: Jonathan Gray
> Priority: Critical
>
> Anything that improves our i/o profile makes hbase run smoother. Over in
> HBASE-2457, good work has been done already describing the tension between
> minimizing compactions versus minimizing count of store files. This issue is
> about following on from what has been done in 2457 but also, breaking the
> hard-to-read compaction code out of Store.java out to a standalone class that
> can be the easier tested (and easily analyzed for its performance
> characteristics).
> If possible, in the refactor, we'd allow specification of alternate merge
> sort implementations.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.