[ https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962278#comment-16962278 ]
Yifan Cai commented on CASSANDRA-15350: --------------------------------------- In the current {{cas}} implementation, WriteTimeoutExceptions ({{WriteType.CAS}}) are thrown under the following scenarios * The overall {{cas}} operation times out. * The PREPARE phase times out. ** Multiple unsuccessful retries and eventually times out ** RPC requests with nodes time out. (networking) ** Multiple proposers contend.Each proposer get promise from the majority and pre-empt the other proposers from proceeding to PROPOSE phase. When the other proposers (thinking they are still the winners, but in fact not) send proposal, they gets rejections from *ALL* acceptors. Such contention continues and time runs out. ** A repair attempt is added in this phase. *** Propose to replay the previous accepted update timeouts *** Commit the update timeout * The PROPOSE phase times out. ** RPC requests with nodes time out. (networking) ** Send proposal to *ALL* acceptors and wait, *** If successful, i.e. majority accepts, we are good. *** If *all* acceptors rejects, it is safe for the proposer to re-submit the proposal with a higher ballot. *** {color:#ff8b00}If some but *not quorum* accepts, the proposal may or may not be replayed by new proposers. (Uncertainty){color} **** If the new proposer reaches to the acceptors that accepted the old proposal, it replays the proposal when it is the most recent in-progress one. **** If the new proposer does not reach to those acceptors, it is free for the new proposer to choose a value and possibly making the earlier proposal to not be qualified for replaying. * The COMMIT phase times out. ** Apply update times out. Note that this is a normal write. The WriteType is {{SIMPLE}} instead of {{CAS}} ** Only exception is when the timeout is from the repair attempt in the PREPARE phase. In this case, the WriteType is overridden to {{CAS}} Most of the timeouts are genuine in the list, except the one colored in {color:#ff8b00}orange{color}. > Add CAS “uncertainty” and “contention" messages that are currently propagated > as a WriteTimeoutException. > --------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-15350 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15350 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Lightweight Transactions > Reporter: Alex Petrov > Priority: Normal > Labels: client-impacting, protocolv5 > > Right now, CAS uncertainty introduced in > https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as > WriteTimeout. One of this conditions it manifests is when there’s at least > one acceptor that has accepted the value, which means that this value _may_ > still get accepted during the later round, despite the proposer failure. > Similar problem happens with CAS contention, which is also indistinguishable > from the “regular” timeout, even though it is visible in metrics correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org