[ 
https://issues.apache.org/jira/browse/HBASE-6093?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell resolved HBASE-6093.
-----------------------------------
    Resolution: Incomplete

> Flatten timestamps during flush and compaction
> ----------------------------------------------
>
>                 Key: HBASE-6093
>                 URL: https://issues.apache.org/jira/browse/HBASE-6093
>             Project: HBase
>          Issue Type: New Feature
>          Components: io, Performance, regionserver
>            Reporter: Matt Corgan
>            Priority: Minor
>
> Many applications run with maxVersions=1 and do not care about timestamps, or 
> they will specify one timestamp per row as a normal KeyValue rather than 
> per-cell.
> Then, DataBlockEncoders like those in HBASE-4218 and HBASE-4676 often encode 
> timestamps as diffs from the previous or diffs from the minimum timestamp in 
> the block.  If all timestamps in a block are the same, they will all compress 
> to basically <= 8 bytes total per block.  This can be 10% to 25% space 
> savings for some schemas, and that savings is realized both on disk and in 
> block cache.
> We could add a ColumnFamily setting flattenTimestamps=[true/false].  If true, 
> then all timestamps are modified during a flush/compaction to the 
> currentTimeMillis() at the start of the flush/compaction.  If all timestamps 
> are made identical in a file, then the encoder will be able to eliminate them.
> The simplest use case is probably that where all inserts are type=Put, there 
> are no overwrites, and there are no deletes.  As use cases get more complex, 
> then so does the implementation.  
> For example, what happens when there is a Put and a Delete of the same cell 
> in the same memstore?  Maybe for a flush at t=flushStartTime, the Put gets 
> timestamp=t, and the Delete gets timestamp=t+1.  Or maybe HBASE-4241 could 
> take care of this problem.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to