Add the ability to restrict major-compactible files by timestamp
----------------------------------------------------------------
Key: HBASE-3745
URL: https://issues.apache.org/jira/browse/HBASE-3745
Project: HBase
Issue Type: Improvement
Affects Versions: 0.92.0
Reporter: Todd Lipcon
In some applications, a common access pattern is to frequently scan tables with
a time range predicate restricted to a fairly recent time window. For example,
you may want to do an incremental aggregation or indexing step only on rows
that have changed in the last hour. We do this efficiently by tracking min and
max timestamp on an HFile level, so that old HFiles don't have to be read.
After a major compaction, however, the entire dataset will need to be read,
which can hurt performance of this access pattern.
We should add a column family attribute that can specify a policy like: When
major compacting, never include an HFile that contains data with a timestamp in
the last 4 hours. This, recently flushed HFiles will always be uncompacted and
provide the good scan performance required for these applications.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira