[ 
https://issues.apache.org/jira/browse/HBASE-8701?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13690072#comment-13690072
 ] 

Jeffrey Zhong commented on HBASE-8701:
--------------------------------------

{quote}
But some can be -ve? So they will be out of order? 
{quote}
the negative mvcc is only used in KV comparison to resolve conflicts when row + 
timestamp is same. The hfile sequence number used in flush is from current 
region server log sequence number not the mvcc value. KVs of the same 
key+timestamp can be out of order in multiple store files while the right KV is 
selected with the help of these negative mvcc values(origin log sequence 
number). (You can refer to KVScannerComparator#compare)

{quote}
Is that safe presumption to make in replay?
Is this the least sequenceid of the batch?
Do we have to add it to WALEdit at all?
{quote}
I checked the wal serialization code, kv isn't PBed and we don't write mvcc 
values into wal. Therefore, I still need to add the "original log sequence 
number" into receiving RS hlogkey of a wal entry to maintain the original order 
of changes when re-replaying these edits for a RS chain failure scenario.
During replay, we can use the last sequence id of a batch because a batch 
request is from a single wal and the relative order of changes is maintained 
when constructing the batch request. So the last KV wins if there are multiple 
KVs of the same key + timestamp.
I don't have to put the "original seq number" into WALEdit while need to change 
several method signatures to pass the info down.


                
> distributedLogReplay need to apply wal edits in the receiving order of those 
> edits
> ----------------------------------------------------------------------------------
>
>                 Key: HBASE-8701
>                 URL: https://issues.apache.org/jira/browse/HBASE-8701
>             Project: HBase
>          Issue Type: Bug
>          Components: MTTR
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.98.0, 0.95.2
>
>         Attachments: 8701-v3.txt, hbase-8701-v4.patch, hbase-8701-v5.patch, 
> hbase-8701-v6.patch, hbase-8701-v7.patch
>
>
> This issue happens in distributedLogReplay mode when recovering multiple puts 
> of the same key + version(timestamp). After replay, the value is 
> nondeterministic of the key
> h5. The original concern situation raised from [~eclark]:
> For all edits the rowkey is the same.
> There's a log with: [ A (ts = 0), B (ts = 0) ]
> Replay the first half of the log.
> A user puts in C (ts = 0)
> Memstore has to flush
> A new Hfile will be created with [ C, A ] and MaxSequenceId = C's seqid.
> Replay the rest of the Log.
> Flush
> The issue will happen in similar situation like Put(key, t=T) in WAL1 and 
> Put(key,t=T) in WAL2
> h5. Below is the option(proposed by Ted) I'd like to use:
> a) During replay, we pass original wal sequence number of each edit to the 
> receiving RS
> b) In receiving RS, we store negative original sequence number of wal edits 
> into mvcc field of KVs of wal edits
> c) Add handling of negative MVCC in KVScannerComparator and KVComparator   
> d) In receiving RS, write original sequence number into an optional field of 
> wal file for chained RS failure situation 
> e) When opening a region, we add a safety bumper(a large number) in order for 
> the new sequence number of a newly opened region not to collide with old 
> sequence numbers. 
> In the future, when we stores sequence number along with KVs, we can adjust 
> the above solution a little bit by avoiding to overload MVCC field.
> h5. The other alternative options are listed below for references:
> Option one
> a) disallow writes during recovery
> b) during replay, we pass original wal sequence ids
> c) hold flush till all wals of a recovering region are replayed. Memstore 
> should hold because we only recover unflushed wal edits. For edits with same 
> key + version, whichever with larger sequence Id wins.
> Option two
> a) During replay, we pass original wal sequence ids
> b) for each wal edit, we store each edit's original sequence id along with 
> its key. 
> c) during scanning, we use the original sequence id if it's present otherwise 
> its store file sequence Id
> d) compaction can just leave put with max sequence id
> Please let me know if you have better ideas.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to