[ 
https://issues.apache.org/jira/browse/HBASE-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sergey Shelukhin resolved HBASE-8709.
-------------------------------------

    Resolution: Invalid

Scratch that, after the first out-of-order choice the ranges can overlap
                
> consider a scheme to allow compacting files in any combination
> --------------------------------------------------------------
>
>                 Key: HBASE-8709
>                 URL: https://issues.apache.org/jira/browse/HBASE-8709
>             Project: HBase
>          Issue Type: Brainstorming
>          Components: Compaction, HFile
>            Reporter: Sergey Shelukhin
>
> We were discussing something and I came up with the following scheme. 
> Consider this. The main problem for choosing out-of-order files for 
> compactions is full key collisions (k, cf:c, ts are the same). We rely on 
> file seqnum to resolve these. There'd be no problem if we stored seqnum for 
> each KV, but that is an overkill. What can we do is this.
> 1) Store min seqnum for a file together with the max. Assume file seqnum 
> ranges don't overlap.
> 2) Store seqnum for each KV in memory of the memstore.
> 3) On flush, don't write out seqnums unless there's a full conflict inside 
> this memstore. We will have to change file format unfortunately to tuck on a 
> bit somewhere to indicate there's varint seqnum.
> 4) On compaction, when dropping versions we can drop these seqnums.
> 5) On compaction, if we see a full conflict with no seqnums (i.e. KVs coming 
> from different files), write out seqnums for the KVs involved as median of 
> the respective file ranges (or something like that). We only ever use these 
> KVs to resolve full conflicts so we don't care about relations between keys. 
> Both of the places where we write seqnums we will need to see the next KV 
> before writing previous KV, so there's some complexity, however the 
> "buffering" is never more than one KV long - if we see different k-c-t we 
> know we don't need a seqnum unless we are in conflict with previous KVs which 
> we have already written, if we see the same one we know we need a seqnum. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to