[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

Sylvain Lebresne (JIRA) Mon, 26 Oct 2015 07:55:47 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974330#comment-14974330
 ]


Sylvain Lebresne commented on CASSANDRA-9328:
---------------------------------------------

bq. you shouldn't retry WTEs

You'll note I didn't mention the term "retry", because in the case of LWT, 
"handling" WTEs is indeed almost always more involved than a simple retry 
(since as you point out, LWT update won't be idempotent). But "more involved" 
does not equate "cannot ever be dealt with". Your application will indeed most 
of the time have to do a read on a WTE to figure out what our state is and what 
you should do. And that does mean you have to model things in a way that allow 
such recovery on WTE.

bq. If CAS is not atomic

Not sure where that hypothesis comes from. CAS is atomic: either all of it will 
be applied or none of it will. It just happens that there is some situation 
where you, the client, won't know which one that is. That property is btw not 
at all specific to Cassandra: take any transaction in any SQL database, if your 
server dies during the request, your client just won't know whether it's been 
applied or not.

Don't get me wrong, as said earlier, the fact that we throw WTE much more often 
than we really should is a shame: it's a potentially big performance penalty in 
particular. But that doesn't mean CAS isn't atomic, nor that you can't use for 
non idempotent operations.

Now, I'm happy to have debate on our CAS implementation and its limitations, 
but the mailing is probably a better venue for that. Regarding this ticket (the 
fact that WTE can be thrown early when there is contention), as I said in a 
previous comment, no-one has so far come up with any idea for how to fix it, so 
I'll close this as "won't fix" which in that case mean: this is a known 
limitation for which we have no short term fix. Feel free to re-open if you 
have a solution to offer. Hopefully, on the long term, moving to EPaxos 
(CASSANDRA-6246) might make that better. 

> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms
> ------------------------------------------------------------------------------------------------------------------------------------
>
>                 Key: CASSANDRA-9328
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Aaron Whiteside
>            Priority: Critical
>             Fix For: 2.1.x
>
>         Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query 
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If 
> the WTE is due to not being able to communicate with other nodes, why does 
> the concurrency >1 cause inter-node communication to fail?



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (CASSANDRA-9328) WriteTimeoutException thrown when LWT concurrency > 1, despite the query duration taking MUCH less than cas_contention_timeout_in_ms

Reply via email to