[ 
https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986310#comment-16986310
 ] 

Yifan Cai commented on CASSANDRA-15350:
---------------------------------------

[~ifesdjeen] and [~spod], big thanks for reviewing the patch!

Renaming the exception to {{CasWriteStalledException}} and the suggested 
rephrased description for {{CAS_UNCERTAINTY}} sounds good. 
 The meaning of {{CasWriteUncertainty}} is vague, and as being pointed out, the 
WTE indicates that the result is uncertain too. {{CasWriteStalled}} describes 
what happened better. 
 I meant to put _Paxos read_ when writing the description for the 
{{CAS_UNVERTAINTY}} clause. I will update the description with the suggested 
one considered. 

Regarding the cross-version scenarios, I may be wrong, but my current 
understanding is that the ErrorMessage is *not* involved in internode 
messageing. ErrorMessage, derived from 
{{org.apache.cassandra.transport.Message.Response}}, is client-facing. When a 
sub-V5 (i.e. V4) client connects to the V5 server and gets the 
{{CasWriteTimeoutException}}, the server encoding makes sure to produce a 
backward compatible one, so the sub-V5 client is still good to understand the 
server response. 
 The {{decode}} method in {{ErrorMessage}} seems to be only useful for 
{{org.apache.cassandra.transport.SimpleClient/Client}}, which is not started in 
cassandra server.
{quote}Unless you're submitting patches to 2.2, 3.0, and 3.11, let's roll back 
changes to IMessageFilters, since their API has to be binary compatible with 
older versions.
{quote}
The test cases in {{CasWriteTest}} relies on the message intercept function. I 
will back-port the changes to IMessageFilter to the prior versions.
{quote}Should we add timeout tests for responses as well as requests in 
CasWriteTest?
{quote}
Sure. Sound good.
{quote}Is it possible to try and simplify testCasWriteTimeoutDueToContention, 
can we achieve contention with N threads?
{quote}
The test does achieve contention with N threads (1 thread per client). In 
addition, the scenario was carefully crafted to be deterministic and aims to 
produce the same kind of contention.
{quote}both tests peer quite a lot into implementation internals.
{quote}
The test cases mainly manipulate the internode networking to introduce 
latency/partition. In order to produce (and always produce) a rare contention 
scenario, I think those fine-grained control is necessary.
{quote}In ErrorMessage#decode, there are extra brackets around WRITE_TIMEOUT 
clause. You can remove those and fix indentation. Same happens in 
CAS_UNCERTAINTY case.
{quote}
Removing the brackets in the {{switch-case}} statements gives syntax error 
since we are defining the variables with the same name. The brackets help to 
scope the variables.
{quote}If we add comments for activate and deactivate for off/on, maybe it's 
worth to call those off/on?
{quote}
Do you mean rename the method to activate/deactivate, or change the comments to 
on/off? Both sound good to me.

> Add CAS “uncertainty” and “contention" messages that are currently propagated 
> as a WriteTimeoutException.
> ---------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-15350
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-15350
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Feature/Lightweight Transactions
>            Reporter: Alex Petrov
>            Assignee: Yifan Cai
>            Priority: Normal
>              Labels: protocolv5, pull-request-available
>         Attachments: Utf8StringEncodeBench.java
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> Right now, CAS uncertainty introduced in 
> https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as 
> WriteTimeout. One of this conditions it manifests is when there’s at least 
> one acceptor that has accepted the value, which means that this value _may_ 
> still get accepted during the later round, despite the proposer failure. 
> Similar problem happens with CAS contention, which is also indistinguishable 
> from the “regular” timeout, even though it is visible in metrics correctly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to