Add the ability to restrict major-compactible files by timestamp
----------------------------------------------------------------

                 Key: HBASE-3745
                 URL: https://issues.apache.org/jira/browse/HBASE-3745
             Project: HBase
          Issue Type: Improvement
    Affects Versions: 0.92.0
            Reporter: Todd Lipcon


In some applications, a common access pattern is to frequently scan tables with 
a time range predicate restricted to a fairly recent time window. For example, 
you may want to do an incremental aggregation or indexing step only on rows 
that have changed in the last hour. We do this efficiently by tracking min and 
max timestamp on an HFile level, so that old HFiles don't have to be read.

After a major compaction, however, the entire dataset will need to be read, 
which can hurt performance of this access pattern.

We should add a column family attribute that can specify a policy like: When 
major compacting, never include an HFile that contains data with a timestamp in 
the last 4 hours. This, recently flushed HFiles will always be uncompacted and 
provide the good scan performance required for these applications.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to