[
https://issues.apache.org/jira/browse/HBASE-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13285267#comment-13285267
]
Matt Corgan commented on HBASE-6093:
------------------------------------
oops - for flushes you would set all timestamps to the flush start time like i
said above. But for compactions you would would set all timestamps to the
earliest timestamp in the compaction, and ensure that only consecutive files
get compacted together
> Flatten timestamps during flush and compaction
> ----------------------------------------------
>
> Key: HBASE-6093
> URL: https://issues.apache.org/jira/browse/HBASE-6093
> Project: HBase
> Issue Type: New Feature
> Components: io, performance, regionserver
> Reporter: Matt Corgan
> Priority: Minor
>
> Many applications run with maxVersions=1 and do not care about timestamps, or
> they will specify one timestamp per row as a normal KeyValue rather than
> per-cell.
> Then, DataBlockEncoders like those in HBASE-4218 and HBASE-4676 often encode
> timestamps as diffs from the previous or diffs from the minimum timestamp in
> the block. If all timestamps in a block are the same, they will all compress
> to basically <= 8 bytes total per block. This can be 10% to 25% space
> savings for some schemas, and that savings is realized both on disk and in
> block cache.
> We could add a ColumnFamily setting flattenTimestamps=[true/false]. If true,
> then all timestamps are modified during a flush/compaction to the
> currentTimeMillis() at the start of the flush/compaction. If all timestamps
> are made identical in a file, then the encoder will be able to eliminate them.
> The simplest use case is probably that where all inserts are type=Put, there
> are no overwrites, and there are no deletes. As use cases get more complex,
> then so does the implementation.
> For example, what happens when there is a Put and a Delete of the same cell
> in the same memstore? Maybe for a flush at t=flushStartTime, the Put gets
> timestamp=t, and the Delete gets timestamp=t+1. Or maybe HBASE-4241 could
> take care of this problem.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira