[ https://issues.apache.org/jira/browse/CASSANDRA-15745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17099825#comment-17099825 ]
Benedict Elliott Smith edited comment on CASSANDRA-15745 at 5/5/20, 12:10 PM: ------------------------------------------------------------------------------ Thanks for the report [~Osipov]. Just to reformulate (mostly just removing "Topology change is advertised on A", as I don't believe this is a necessary step): # Topology change starts on C, replacing A with D, only visible on C # CAS1 starts on C with \{A, B, C, D} # CAS2 (ballot > CAS1) starts on A with \{A, B, C} # CAS1 prepares on \{B, C, D} (timeout on A) # CAS2 prepares and accepts on \{A, B} (timeout on C); commits on A; terminates # CAS1 accepts on D; terminates # Topology change finishes (A is removed), visible globally # CAS3 prepares with \{C, D}, sees accept of CAS1 and re-proposes it (with a newer ballot) Unfortunately this isn't trivial to fix, though there is more than one approach. I happen to have an incomplete piece of work that should be able to address this issue, but I have no timeline on when I may be able to propose it here as a patch. was (Author: benedict): Thanks for the report [~Osipov]. Just to reformulate (mostly just removing "Topology change is advertised on A", as I don't believe this is a necessary step): # Topology change starts on C, replacing A with D, only visible on C # CAS1 starts on C with \{A, B, C, D} # CAS2 (ballot > CAS1) starts on A with {A, B, C} # CAS1 prepares on {B, C, D} (timeout on A) # CAS2 prepares and accepts on \{A, B} (timeout on C); commits on A; terminates # CAS1 accepts on D; terminates # Topology change finishes (A is removed), visible globally # CAS3 prepares with \{C, D}, sees accept of CAS1 and re-proposes it (with a newer ballot) Unfortunately this isn't trivial to fix, though there is more than one approach. I happen to have an incomplete piece of work that should be able to address this issue, but I have no timeline on when I may be able to propose it here as a patch. > Conflicting LWT transactions may be committed during topology change > -------------------------------------------------------------------- > > Key: CASSANDRA-15745 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15745 > Project: Cassandra > Issue Type: Bug > Components: Feature/Lightweight Transactions > Reporter: Konstantin > Priority: Normal > > Let's consider a cluster which consists of replicas A, B and C. > We're adding replica D which replaces A. > A scenario is possible when two conflicting transactions, CAS1 and CAS2, may > be committed during replace: > CAS2 ballot > CAS1 ballot > CAS2 and CAS1 conflict on LWT condition, yet both of them may be committed > in case of the following sequence of events: > Topology change starts, advertises on C > CAS1 starts on node C, uses {A, B, C, D} > CAS2 starts on node A, still uses {A, B, C} > Topology change is advertised on A > CAS1 prepares on {B, C, D} > CAS2 prepares and accepts on {A, B}, commits on A > CAS1 accepts on D, then stops > Streaming starts, topology change finishes, A is removed > CAS3 prepares using C and D. It sees the accept of CAS1 and replays it > Both CAS1 and CAS2 are committed. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org