[
https://issues.apache.org/jira/browse/CASSANDRA-9328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14974330#comment-14974330
]
Sylvain Lebresne commented on CASSANDRA-9328:
---------------------------------------------
bq. you shouldn't retry WTEs
You'll note I didn't mention the term "retry", because in the case of LWT,
"handling" WTEs is indeed almost always more involved than a simple retry
(since as you point out, LWT update won't be idempotent). But "more involved"
does not equate "cannot ever be dealt with". Your application will indeed most
of the time have to do a read on a WTE to figure out what our state is and what
you should do. And that does mean you have to model things in a way that allow
such recovery on WTE.
bq. If CAS is not atomic
Not sure where that hypothesis comes from. CAS is atomic: either all of it will
be applied or none of it will. It just happens that there is some situation
where you, the client, won't know which one that is. That property is btw not
at all specific to Cassandra: take any transaction in any SQL database, if your
server dies during the request, your client just won't know whether it's been
applied or not.
Don't get me wrong, as said earlier, the fact that we throw WTE much more often
than we really should is a shame: it's a potentially big performance penalty in
particular. But that doesn't mean CAS isn't atomic, nor that you can't use for
non idempotent operations.
Now, I'm happy to have debate on our CAS implementation and its limitations,
but the mailing is probably a better venue for that. Regarding this ticket (the
fact that WTE can be thrown early when there is contention), as I said in a
previous comment, no-one has so far come up with any idea for how to fix it, so
I'll close this as "won't fix" which in that case mean: this is a known
limitation for which we have no short term fix. Feel free to re-open if you
have a solution to offer. Hopefully, on the long term, moving to EPaxos
(CASSANDRA-6246) might make that better.
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query
> duration taking MUCH less than cas_contention_timeout_in_ms
> ------------------------------------------------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-9328
> URL: https://issues.apache.org/jira/browse/CASSANDRA-9328
> Project: Cassandra
> Issue Type: Bug
> Components: Core
> Reporter: Aaron Whiteside
> Priority: Critical
> Fix For: 2.1.x
>
> Attachments: CassandraLWTTest.java, CassandraLWTTest2.java
>
>
> WriteTimeoutException thrown when LWT concurrency > 1, despite the query
> duration taking MUCH less than cas_contention_timeout_in_ms.
> Unit test attached, run against a 3 node cluster running 2.1.5.
> If you reduce the threadCount to 1, you never see a WriteTimeoutException. If
> the WTE is due to not being able to communicate with other nodes, why does
> the concurrency >1 cause inter-node communication to fail?
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)