[
https://issues.apache.org/jira/browse/HBASE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13617833#comment-13617833
]
Elliott Clark commented on HBASE-7842:
--------------------------------------
I'd rather leave it there, just in case someone has a workload that relied on
the old behavior. Perhaps it should be re-named ?
> Add compaction policy that explores more storefile groups
> ---------------------------------------------------------
>
> Key: HBASE-7842
> URL: https://issues.apache.org/jira/browse/HBASE-7842
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Reporter: Elliott Clark
> Assignee: Elliott Clark
> Attachments: HBASE-7842-0.patch, HBASE-7842-2.patch,
> HBASE-7842-3.patch, HBASE-7842-4.patch, HBASE-7842-5.patch
>
>
> Some workloads that are not as stable can have compactions that are too large
> or too small using the current storefile selection algorithm.
> Currently:
> * Find the first file that Size(fi) <= Sum(0, i-1, FileSize(fx))
> * Ensure that there are the min number of files (if there aren't then bail
> out)
> * If there are too many files keep the larger ones.
> I would propose something like:
> * Find all sets of storefiles where every file satisfies
> ** FileSize(fi) <= Sum(0, i-1, FileSize(fx))
> ** Num files in set =< max
> ** Num Files in set >= min
> * Then pick the set of files that maximizes ((# storefiles in set) /
> Sum(FileSize(fx)))
> The thinking is that the above algorithm is pretty easy reason about, all
> files satisfy the ratio, and should rewrite the least amount of data to get
> the biggest impact in seeks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira