[
https://issues.apache.org/jira/browse/HBASE-8709?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Sergey Shelukhin resolved HBASE-8709.
-------------------------------------
Resolution: Invalid
Scratch that, after the first out-of-order choice the ranges can overlap
> consider a scheme to allow compacting files in any combination
> --------------------------------------------------------------
>
> Key: HBASE-8709
> URL: https://issues.apache.org/jira/browse/HBASE-8709
> Project: HBase
> Issue Type: Brainstorming
> Components: Compaction, HFile
> Reporter: Sergey Shelukhin
>
> We were discussing something and I came up with the following scheme.
> Consider this. The main problem for choosing out-of-order files for
> compactions is full key collisions (k, cf:c, ts are the same). We rely on
> file seqnum to resolve these. There'd be no problem if we stored seqnum for
> each KV, but that is an overkill. What can we do is this.
> 1) Store min seqnum for a file together with the max. Assume file seqnum
> ranges don't overlap.
> 2) Store seqnum for each KV in memory of the memstore.
> 3) On flush, don't write out seqnums unless there's a full conflict inside
> this memstore. We will have to change file format unfortunately to tuck on a
> bit somewhere to indicate there's varint seqnum.
> 4) On compaction, when dropping versions we can drop these seqnums.
> 5) On compaction, if we see a full conflict with no seqnums (i.e. KVs coming
> from different files), write out seqnums for the KVs involved as median of
> the respective file ranges (or something like that). We only ever use these
> KVs to resolve full conflicts so we don't care about relations between keys.
> Both of the places where we write seqnums we will need to see the next KV
> before writing previous KV, so there's some complexity, however the
> "buffering" is never more than one KV long - if we see different k-c-t we
> know we don't need a seqnum unless we are in conflict with previous KVs which
> we have already written, if we see the same one we know we need a seqnum.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira