[ 
https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16990294#comment-16990294
 ] 

Yifan Cai commented on CASSANDRA-15350:
---------------------------------------

[~ifesdjeen], thanks for the reply. 
h4. Exception naming

Put more thoughts into the exception naming (I am definitely not good at it :/ 
). 
 # The exception is *not* sent to clients, e.g. java driver. What being sent to 
the clients is the error message with the error code defined at 
[native_protocol_v5, Section 9]. 
 # The exception is used mainly to understanding the code or reasoning 
about/debugging server errors from log. 
 # The exception is caused by contention during Paxos
 # The result of the CAS is unknown, and the exception should express the 
status.
 # Timeout implies the result is unknown and the operation exceeds the 
permitted duration, which is not the case here.

The name of the exception should describe what happened (the unknown result) 
and how it happened (due to contention during Paxos). 

{{CasWriteResultUnknownException}} is a reasonable name too. The name indicates 
the result explicitly and how it happened. 

The error code name in protocol spec should be updated to 
{{CAS_WRITE_RESULT_UNKNOWN}} from {{CAS_UNCERTAINTY}} in order to easily 
associate with the exception when reading the code.
h4. The cas write test case

Understand the concern. IMO, the purpose of the test is to validate the specs 
remains correct with the new changes. The tests defines the specs. If someone 
changes the code related to CAS and the tests are broken, it is very likely the 
changes introduce some bug. However, if the implementation of CAS is entirely 
changed, then the tests are not valid anymore and can be removed from the code 
base. 

Paxos has multiple rounds and there are rules in each round. The tests check 
the rules and assert the expected result of each round. Although the test case 
here is not comprehensive, it provides a certain level of coverage for the 
contention case. Fuzzing can surely be more inclusive. 

The maintenance concern might be bigger than I thought. So I am all ear and OK 
to remove those complex test cases if insist. 
h4. Others
 * The test case for encode/decode the error message with the new fields are 
missing. I will add them in the {{ErrorMessageTest}} to test the scenario when 
communicating with a lower version client.
 * Maybe the {{CasWriteTimeoutException}} is not necessary? I take a second 
look and feel the {{contentions#}} sent to clients might not be helpful to 
them. 
 * The {{InterceptFilter}} can clog the message outbound sink. It is because 
the jvm dtest instance run the filters in the same thread. I made the 
{{MessageDeliverySink}} to run in async for each outbound message to avoid 
blocking the evaluation of other filters. 

> Add CAS “uncertainty” and “contention" messages that are currently propagated 
> as a WriteTimeoutException.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15350
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15350
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/Lightweight Transactions
>            Reporter: Alex Petrov
>            Assignee: Yifan Cai
>            Priority: Normal
>              Labels: protocolv5, pull-request-available
>         Attachments: Utf8StringEncodeBench.java
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, CAS uncertainty introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as 
> WriteTimeout. One of this conditions it manifests is when there’s at least 
> one acceptor that has accepted the value, which means that this value _may_ 
> still get accepted during the later round, despite the proposer failure. 
> Similar problem happens with CAS contention, which is also indistinguishable 
> from the “regular” timeout, even though it is visible in metrics correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to