Alex Petrov commented on CASSANDRA-15350:

To be honest, I think the fact that names {{WriteStalled}} and {{WriteTimeout}} 
are quite close to each other might confuse the user. We need to reflect the 
fact that it's a Paxos round failure or that the reason is that _we do not 
know_ whether the value is going to go through or not.

bq. ErrorMessage is not involved in internode messageing

Err; of course. Sorry about that: was thinking in a different context and 
phrased it wrong. Also, both messages occur on the coordinator, so internode 
doesn't apply. What I should have written is that this logic is used by the 

bq. the scenario was carefully crafted to be deterministic and aims to produce 
the same kind of contention.

This is precisely what I'm concerned about: it is carefully crafted and might 
be difficult to maintain. Everyone who'll be modifying the code in the future 
will have to re-craft the test as well. I think we can relatively easily 
reproduce it with a fuzz test that introduces contention. I think introducing 
latency/partition in the test is a reasonable thing, I'd just make it random 
rather than handcrafted. This will also help us to see how it all behaves when 
contention is higher.

bq. Do you mean rename the method to activate/deactivate,

Right, I'd just call them {{activate}} and {{deactivate}}.

We also need at least a version of the {{SimpleClient}} to be tested with the 
changes. Ideally, we need an accompanying patch for the java-driver, since it 
changes the native protocol.

> Add CAS “uncertainty” and “contention" messages that are currently propagated 
> as a WriteTimeoutException.
> ---------------------------------------------------------------------------------------------------------
>                 Key: CASSANDRA-15350
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15350
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/Lightweight Transactions
>            Reporter: Alex Petrov
>            Assignee: Yifan Cai
>            Priority: Normal
>              Labels: protocolv5, pull-request-available
>         Attachments: Utf8StringEncodeBench.java
>          Time Spent: 20m
>  Remaining Estimate: 0h
> Right now, CAS uncertainty introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as 
> WriteTimeout. One of this conditions it manifests is when there’s at least 
> one acceptor that has accepted the value, which means that this value _may_ 
> still get accepted during the later round, despite the proposer failure. 
> Similar problem happens with CAS contention, which is also indistinguishable 
> from the “regular” timeout, even though it is visible in metrics correctly.

This message was sent by Atlassian Jira

To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to