[ https://issues.apache.org/jira/browse/HBASE-24440?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17121293#comment-17121293 ]
Andrew Kyle Purtell edited comment on HBASE-24440 at 6/1/20, 8:12 PM: ---------------------------------------------------------------------- I am aware. If we do this I don’t think we will need it at all, configurable or not. But that is out of scope for this issue. Edit: Some might respond, validly, that this is splitting hairs, because one follows the other: If we will never have two exact keys including timestamps ever committed to a row, then we don't need a sorting rule by operator precedence for a case that, after this proposed change, can never happen. I am proposing we do it in steps, with small reversible changes, because this is such a critical area for correctness, but if the consensus is to do it together, I would not oppose that for what it's worth. was (Author: apurtell): I am aware. If we do this I don’t think we will need it at all, configurable or not. But that is out of scope for this issue. Edit: Some might respond, validly, that this is splitting hairs, because one follows the other: If we will never have two exact keys including timestamps ever committed to a row, then we don't need a sorting rule by operator precedence. I am proposing we do it in steps, with small reversible changes, because this is such a critical area for correctness, but if the consensus is to do it together, I would not oppose that for what it's worth. > Prevent temporal misordering on timescales smaller than one clock tick > ---------------------------------------------------------------------- > > Key: HBASE-24440 > URL: https://issues.apache.org/jira/browse/HBASE-24440 > Project: HBase > Issue Type: Brainstorming > Reporter: Andrew Kyle Purtell > Priority: Major > > When mutations are sent to the servers without a timestamp explicitly > assigned by the client the server will substitute the current wall clock > time. There are edge cases where it is at least theoretically possible for > more than one mutation to be committed to a given row within the same clock > tick. When this happens we have to track and preserve the ordering of these > mutations in some other way besides the timestamp component of the key. Let > me bypass most discussion here by noting that whether we do this or not, we > do not pass such ordering information in the cross cluster replication > protocol. We also have interesting edge cases regarding key type precedence > when mutations arrive "simultaneously": we sort deletes ahead of puts. This, > especially in the presence of replication, can lead to visible anomalies for > clients able to interact with both source and sink. > There is a simple solution that removes the possibility that these edge cases > can occur: > We can detect, when we are about to commit a mutation to a row, if we have > already committed a mutation to this same row in the current clock tick. > Occurrences of this condition will be rare. We are already tracking current > time. We have to know this in order to assign the timestamp. Where this > becomes interesting is how we might track the last commit time per row. > Making the detection of this case efficient for the normal code path is the > bulk of the challenge. One option is to keep track of the last locked time > for row locks. (Todo: How would we track and garbage collect this efficiently > and correctly. Not the ideal option.) We might also do this tracking somehow > via the memstore. (At least in this case the lifetime and distribution of in > memory row state, including the proposed timestamps, would align.) Assuming > we can efficiently know if we are about to commit twice to the same row > within a single clock tick, we would simply sleep/yield the current thread > until the clock ticks over, and then proceed. -- This message was sent by Atlassian Jira (v8.3.4#803005)