[ https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13478543#comment-13478543 ]
Lars Hofhansl commented on HBASE-6371: -------------------------------------- Specifically a scenario I'd be interested in, is to keep a days (or two) worth of changes in a live HBase cluster. In extreme cases this might be lead to 1000's of versions, and scan performance of the latest version suffers significantly, *especially* after a major compaction which will cause all version of KVs to be jumbled together in the same file. > [89-fb] Tier based compaction > ----------------------------- > > Key: HBASE-6371 > URL: https://issues.apache.org/jira/browse/HBASE-6371 > Project: HBase > Issue Type: Improvement > Reporter: Akashnil > Assignee: Liyin Tang > Labels: noob > > Currently, the compaction selection is not very flexible and is not sensitive > to the hotness of the data. Very old data is likely to be accessed less, and > very recent data is likely to be in the block cache. Both of these > considerations make it inefficient to compact these files as aggressively as > other files. In some use-cases, the access-pattern is particularly obvious > even though there is no way to control the compaction algorithm in those > cases. > In the new compaction selection algorithm, we plan to divide the candidate > files into different levels according to oldness of the data that is present > in those files. For each level, parameters like compaction ratio, minimum > number of store-files in each compaction may be different. Number of levels, > time-ranges, and parameters for each level will be configurable online on a > per-column family basis. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira