[
https://issues.apache.org/jira/browse/HBASE-8283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13626021#comment-13626021
]
Elliott Clark commented on HBASE-8283:
--------------------------------------
There are really three goals.
h2.Goal One
The first goal is to fix bulk load files. Right now their ordering gets messed
up after a compaction happens. This leads to some weird compactions where the
smallest files are being compacted with the largest. This is possible because
the compaction policy right now approves the candidates list as soon as one
file is less than or equal to the files after it. The bulk loaded files are
always on the on the left. The new large file created from compaction does not
have the bulk load flag (thats lost) and it will have a seqId of 0.
h2.Goal Two
The other goal is to only compact files that are all inside of a ratio. All
canidate files are selected if the is one file to the left that satisfies the
ratio SizeFile(j) <= SumFileSize( j-1, 0). Workloads where there are large
fluctuations can select weird groups of files.
Suppose there's a write work load that's heavily sinusoidal.
[ 1 1 50 150 180 150 50 1 1 1 1 ]
Currently we'd pick 1 1 50 as the files to compact.
1 1 1 are the most like each other.
150 180 150 are also more similar and would logically be better matches than
the ones currently picked.
h2.Goal Three
Just because files are picked doesn't mean they are the best choice. Right now
our compaction algorithm is pretty naive. This is a cut at choosing files
based on more than one heuristic (ratio, num files removed, and IO required).
> Backport HBASE-7842 Add compaction policy that explores more storefile groups
> to 0.94
> -------------------------------------------------------------------------------------
>
> Key: HBASE-8283
> URL: https://issues.apache.org/jira/browse/HBASE-8283
> Project: HBase
> Issue Type: Task
> Components: Compaction
> Affects Versions: 0.94.0
> Reporter: Elliott Clark
> Assignee: Elliott Clark
> Attachments: HBASE-8283-0.patch
>
>
> HBASE-7842 Add compaction policy that explores more storefile groups
> Added a new compaction policy that greatly improves selecting files if there
> are bulk loaded files.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira