[ 
https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000455#comment-14000455
 ] 

Jeffrey Zhong commented on HBASE-8763:
--------------------------------------

Thanks [~saint....@gmail.com] for the good comments! I'll make corresponding 
changes in my next patch. Please see my answers below:

{quote}
Hmm... maybe we should make the writeQueue a disruptor too? Later.
{quote}
That's a good idea. I could also remove the writeQueue even without disruptor. 
Basically return the previous highestUnsyncedSequence from disruptor consumer 
code where we stampRegionSequenceId in FSHLog#append() before current wal 
append, inside syncOrDefer we will wait for this previous 
highestUnsyncedSequence for skip_wal & async_wal case. Let's do it later in a 
separate JIRA.

{quote}
Should beginMemstoreInsert assign a write mvcc number at all?
{quote}
No need to assign a write number and I could pass 0 here. The main purpose for 
the call is inserting a marker so that later code can wait all mvcc 
transactions before marker complete. this.sequenceId.incrementAndGet() is just 
a way to pass some "good" number without messing up the memstore read point 
later by mvcc.advanceMemstore(w) call. 
I'll pass HLog.NO_SEQUENCE_ID value then.

{quote}
Misspelling: waitForPreviousTransactoinsComplete
{quote}
Good catch.

{quote}
Call this appendNoSyncFakedWALEdit appendNoSyncNoAppend
{quote}
Ok. I'll make the change

{quote}
It is convenient updating once and all related get the update. Is that the only 
reason?
{quote}
It's not only for convenient but for correctness. Let's say we have two updates 
coming in the order of c1, c2 while c1 could have seqId=2 and c2 could have 
seqId=1. As you know wal syncer syncs all available pending appends and it's 
likely both wal entries for e1 and e2 are synced at same time. Therefore, the 
mvcc for c1 will advance memstore read point to 2. Since MutableLong is used 
here, so we know for sure c1's MVCC has been updated to 1 before c2 gets its 
seqid=2 otherwise we could end up with the situation that memstore read point 
has been set to 2 while c1's mvcc in menstore hasn't been updated yet depending 
on the caller thread scheduling.(Unless I pass MVCC into ring buffer where we 
keep the reference of all new KVs and update mvcc for them in disruptor 
consumer code) .
 
{quote}
Set top bits rather than add a big number?: + curSeqNum.setValue(originalVal + 
1000000000);
{quote}
How about the case if the first bit is already used for a large sequence 
number? In theory, I only need to bump the number to 2 * the number of rpc 
handlers because the writeQueue will be blocked if current mvcc isn't complete. 
The big number here is just to be safe.

{quote}
Do we need to have MemStore know about HLogKeys?
Could we do the wait in the WAL system before we call 
completeMemstoreInsertWithSeqNum passing in the sequence id to use? Could the 
consumer on the ring buffer call the Memstore. completeMemstoreInsertWithSeqNum?
{quote}
Memstore doesn't know hlogkey but MVCC. I can do it outside of MVCC but that 
requires all places calling completeMemstoreInsertWithSeqNum need to remember 
to call the waitForLogSequence. It may leave hole in the future. 


> [BRAINSTORM] Combine MVCC and SeqId
> -----------------------------------
>
>                 Key: HBASE-8763
>                 URL: https://issues.apache.org/jira/browse/HBASE-8763
>             Project: HBase
>          Issue Type: Improvement
>          Components: regionserver
>            Reporter: Enis Soztutar
>            Assignee: Jeffrey Zhong
>            Priority: Critical
>         Attachments: HBase MVCC & LogSeqId Combined.pdf, 
> hbase-8736-poc.patch, hbase-8763-poc-v1.patch, hbase-8763-v1.patch, 
> hbase-8763-v2.patch, hbase-8763-v3.patch, hbase-8763_wip1.patch
>
>
> HBASE-8701 and a lot of recent issues include good discussions about mvcc + 
> seqId semantics. It seems that having mvcc and the seqId complicates the 
> comparator semantics a lot in regards to flush + WAL replay + compactions + 
> delete markers and out of order puts. 
> Thinking more about it I don't think we need a MVCC write number which is 
> different than the seqId. We can keep the MVCC semantics, read point and 
> smallest read points intact, but combine mvcc write number and seqId. This 
> will allow cleaner semantics + implementation + smaller data files. 
> We can do some brainstorming for 0.98. We still have to verify that this 
> would be semantically correct, it should be so by my current understanding.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to