[
https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16962278#comment-16962278
]
Yifan Cai commented on CASSANDRA-15350:
---------------------------------------
In the current {{cas}} implementation, WriteTimeoutExceptions
({{WriteType.CAS}}) are thrown under the following scenarios
* The overall {{cas}} operation times out.
* The PREPARE phase times out.
** Multiple unsuccessful retries and eventually times out
** RPC requests with nodes time out. (networking)
** Multiple proposers contend.Each proposer get promise from the majority and
pre-empt the other proposers from proceeding to PROPOSE phase. When the other
proposers (thinking they are still the winners, but in fact not) send proposal,
they gets rejections from *ALL* acceptors. Such contention continues and time
runs out.
** A repair attempt is added in this phase.
*** Propose to replay the previous accepted update timeouts
*** Commit the update timeout
* The PROPOSE phase times out.
** RPC requests with nodes time out. (networking)
** Send proposal to *ALL* acceptors and wait,
*** If successful, i.e. majority accepts, we are good.
*** If *all* acceptors rejects, it is safe for the proposer to re-submit the
proposal with a higher ballot.
*** {color:#ff8b00}If some but *not quorum* accepts, the proposal may or may
not be replayed by new proposers. (Uncertainty){color}
**** If the new proposer reaches to the acceptors that accepted the old
proposal, it replays the proposal when it is the most recent in-progress one.
**** If the new proposer does not reach to those acceptors, it is free for the
new proposer to choose a value and possibly making the earlier proposal to not
be qualified for replaying.
* The COMMIT phase times out.
** Apply update times out. Note that this is a normal write. The WriteType is
{{SIMPLE}} instead of {{CAS}}
** Only exception is when the timeout is from the repair attempt in the
PREPARE phase. In this case, the WriteType is overridden to {{CAS}}
Most of the timeouts are genuine in the list, except the one colored in
{color:#ff8b00}orange{color}.
> Add CAS “uncertainty” and “contention" messages that are currently propagated
> as a WriteTimeoutException.
> ---------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-15350
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15350
> Project: Cassandra
> Issue Type: Improvement
> Components: Feature/Lightweight Transactions
> Reporter: Alex Petrov
> Priority: Normal
> Labels: client-impacting, protocolv5
>
> Right now, CAS uncertainty introduced in
> https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as
> WriteTimeout. One of this conditions it manifests is when there’s at least
> one acceptor that has accepted the value, which means that this value _may_
> still get accepted during the later round, despite the proposer failure.
> Similar problem happens with CAS contention, which is also indistinguishable
> from the “regular” timeout, even though it is visible in metrics correctly.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]