Andrew Kyle Purtell created HBASE-24440:
-------------------------------------------
Summary: Prevent temporal misordering on timescales smaller than
one clock tick
Key: HBASE-24440
URL: https://issues.apache.org/jira/browse/HBASE-24440
Project: HBase
Issue Type: Improvement
Reporter: Andrew Kyle Purtell
When mutations are sent to the servers without a timestamp explicitly assigned
by the client the server will substitute the current wall clock time. There are
edge cases where it is at least theoretically possible for more than one
mutation to be committed to a given row within the same clock tick. When this
happens we have to track and preserve the ordering of these mutations in some
other way besides the timestamp component of the key. Let me bypass most
discussion here by noting that whether we do this or not, we do not pass such
ordering information in the cross cluster replication protocol. We also have
interesting edge cases regarding key type precedence when mutations arrive
"simultaneously": we sort deletes ahead of puts. This, especially in the
presence of replication, can lead to visible anomalies for clients able to
interact with both source and sink.
There is a simple solution that removes the possibility that these edge cases
can occur:
We can detect, when we are about to commit a mutation to a row, if we have
already committed a mutation to this same row in the current clock tick.
Occurrences of this condition will be rare. We are already tracking current
time. We have to know this in order to assign the timestamp. Where this becomes
interesting is how we might track the last commit time per row. Making the
detection of this case efficient for the normal code path is the bulk of the
challenge. We would do this somehow via the memstore. Assuming we can
efficiently know if we are about to commit twice to the same row within a
single clock tick, we would simply sleep/yield the current thread until the
clock ticks over, and then proceed.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)