[
https://issues.apache.org/jira/browse/HBASE-8763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13839728#comment-13839728
]
Jeffrey Zhong commented on HBASE-8763:
--------------------------------------
{quote}
What will we do if two edits arrive with same coordinates? How will we
distingush them if both have long.max during the time it takes to sync and
converte long.max to a legit seqid?
{quote}
Basically the MVCC write number only needs to make sure scanner can't see them
before a write is done. Therefore we can assign them to Long.MAX. It means all
in-progress writes belongs to one bucket and scanner can't see them. Once a
write is done, we assign them the logSeqNumber in WAL appending order and then
bump up the min read point so that all writes before current log sequence
number are visible to scanners. In this case, client can see changes in the
order we commit the writes.
There are two orders in today's code because we assign the write number before
a write starts: receiving order and commit order. For example, Put1 has write
number 1 and Put2 has write number 2 while Put2 can finish earlier than Put1
but Put2 still need wait for Put1 to finish. This cause issues for replication
and recovery because both replies on the order(commit order) in the WAL file.
{quote}
What are the two locks J?
{quote}
In file MultiVersionConsistencyControl, the locks guard the access to
writeQueue. Since we don't need keep the receiving order(which have to today
because large write number could complete earlier than smaller write number),
we can remove the related code as you can see my proof-of-concept patch
beginMemstoreInsertUseSeqNum & advanceMemstoreUseSeqNum. I still keep a
collection inProgressWrites because our Increment, Append etc needs all
in-progress done but this part can be optimized by just keeping a hashmap for
rows which row lock are released but not wal synced yet.
Thanks.
{code}
public WriteEntry beginMemstoreInsert() {
synchronized (writeQueue) {
long nextWriteNumber = ++memstoreWrite;
WriteEntry e = new WriteEntry(nextWriteNumber);
writeQueue.add(e);
return e;
}
}
boolean advanceMemstore(WriteEntry e) {
synchronized (writeQueue) {
...
while (!writeQueue.isEmpty()) {
...
}
}
}
{code}
> [BRAINSTORM] Combine MVCC and SeqId
> -----------------------------------
>
> Key: HBASE-8763
> URL: https://issues.apache.org/jira/browse/HBASE-8763
> Project: HBase
> Issue Type: Improvement
> Components: regionserver
> Reporter: Enis Soztutar
> Attachments: hbase-8736-poc.patch, hbase-8763_wip1.patch
>
>
> HBASE-8701 and a lot of recent issues include good discussions about mvcc +
> seqId semantics. It seems that having mvcc and the seqId complicates the
> comparator semantics a lot in regards to flush + WAL replay + compactions +
> delete markers and out of order puts.
> Thinking more about it I don't think we need a MVCC write number which is
> different than the seqId. We can keep the MVCC semantics, read point and
> smallest read points intact, but combine mvcc write number and seqId. This
> will allow cleaner semantics + implementation + smaller data files.
> We can do some brainstorming for 0.98. We still have to verify that this
> would be semantically correct, it should be so by my current understanding.
--
This message was sent by Atlassian JIRA
(v6.1#6144)