[ https://issues.apache.org/jira/browse/CASSANDRA-15350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16986310#comment-16986310 ]
Yifan Cai commented on CASSANDRA-15350: --------------------------------------- [~ifesdjeen] and [~spod], big thanks for reviewing the patch! Renaming the exception to {{CasWriteStalledException}} and the suggested rephrased description for {{CAS_UNCERTAINTY}} sounds good. The meaning of {{CasWriteUncertainty}} is vague, and as being pointed out, the WTE indicates that the result is uncertain too. {{CasWriteStalled}} describes what happened better. I meant to put _Paxos read_ when writing the description for the {{CAS_UNVERTAINTY}} clause. I will update the description with the suggested one considered. Regarding the cross-version scenarios, I may be wrong, but my current understanding is that the ErrorMessage is *not* involved in internode messageing. ErrorMessage, derived from {{org.apache.cassandra.transport.Message.Response}}, is client-facing. When a sub-V5 (i.e. V4) client connects to the V5 server and gets the {{CasWriteTimeoutException}}, the server encoding makes sure to produce a backward compatible one, so the sub-V5 client is still good to understand the server response. The {{decode}} method in {{ErrorMessage}} seems to be only useful for {{org.apache.cassandra.transport.SimpleClient/Client}}, which is not started in cassandra server. {quote}Unless you're submitting patches to 2.2, 3.0, and 3.11, let's roll back changes to IMessageFilters, since their API has to be binary compatible with older versions. {quote} The test cases in {{CasWriteTest}} relies on the message intercept function. I will back-port the changes to IMessageFilter to the prior versions. {quote}Should we add timeout tests for responses as well as requests in CasWriteTest? {quote} Sure. Sound good. {quote}Is it possible to try and simplify testCasWriteTimeoutDueToContention, can we achieve contention with N threads? {quote} The test does achieve contention with N threads (1 thread per client). In addition, the scenario was carefully crafted to be deterministic and aims to produce the same kind of contention. {quote}both tests peer quite a lot into implementation internals. {quote} The test cases mainly manipulate the internode networking to introduce latency/partition. In order to produce (and always produce) a rare contention scenario, I think those fine-grained control is necessary. {quote}In ErrorMessage#decode, there are extra brackets around WRITE_TIMEOUT clause. You can remove those and fix indentation. Same happens in CAS_UNCERTAINTY case. {quote} Removing the brackets in the {{switch-case}} statements gives syntax error since we are defining the variables with the same name. The brackets help to scope the variables. {quote}If we add comments for activate and deactivate for off/on, maybe it's worth to call those off/on? {quote} Do you mean rename the method to activate/deactivate, or change the comments to on/off? Both sound good to me. > Add CAS “uncertainty” and “contention" messages that are currently propagated > as a WriteTimeoutException. > --------------------------------------------------------------------------------------------------------- > > Key: CASSANDRA-15350 > URL: https://issues.apache.org/jira/browse/CASSANDRA-15350 > Project: Cassandra > Issue Type: Improvement > Components: Feature/Lightweight Transactions > Reporter: Alex Petrov > Assignee: Yifan Cai > Priority: Normal > Labels: protocolv5, pull-request-available > Attachments: Utf8StringEncodeBench.java > > Time Spent: 20m > Remaining Estimate: 0h > > Right now, CAS uncertainty introduced in > https://issues.apache.org/jira/browse/CASSANDRA-6013 is propagating as > WriteTimeout. One of this conditions it manifests is when there’s at least > one acceptor that has accepted the value, which means that this value _may_ > still get accepted during the later round, despite the proposer failure. > Similar problem happens with CAS contention, which is also indistinguishable > from the “regular” timeout, even though it is visible in metrics correctly. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org