[
https://issues.apache.org/jira/browse/HBASE-7842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Elliott Clark updated HBASE-7842:
---------------------------------
Resolution: Fixed
Fix Version/s: 0.98.0
0.95.1
Release Note: Default compaction policy has been changed to a new policy
that will explore more groups of files and is more strict about enforcing the
size ratio requirements.
Hadoop Flags: Reviewed
Status: Resolved (was: Patch Available)
> Add compaction policy that explores more storefile groups
> ---------------------------------------------------------
>
> Key: HBASE-7842
> URL: https://issues.apache.org/jira/browse/HBASE-7842
> Project: HBase
> Issue Type: New Feature
> Components: Compaction
> Reporter: Elliott Clark
> Assignee: Elliott Clark
> Fix For: 0.95.1, 0.98.0
>
> Attachments: HBASE-7842-0.patch, HBASE-7842-2.patch,
> HBASE-7842-3.patch, HBASE-7842-4.patch, HBASE-7842-5.patch,
> HBASE-7842-6.patch, HBASE-7842-7.patch
>
>
> Some workloads that are not as stable can have compactions that are too large
> or too small using the current storefile selection algorithm.
> Currently:
> * Find the first file that Size(fi) <= Sum(0, i-1, FileSize(fx))
> * Ensure that there are the min number of files (if there aren't then bail
> out)
> * If there are too many files keep the larger ones.
> I would propose something like:
> * Find all sets of storefiles where every file satisfies
> ** FileSize(fi) <= Sum(0, i-1, FileSize(fx))
> ** Num files in set =< max
> ** Num Files in set >= min
> * Then pick the set of files that maximizes ((# storefiles in set) /
> Sum(FileSize(fx)))
> The thinking is that the above algorithm is pretty easy reason about, all
> files satisfy the ratio, and should rewrite the least amount of data to get
> the biggest impact in seeks.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira