[ 
https://issues.apache.org/jira/browse/CASSANDRA-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390450#comment-14390450
 ] 

Sylvain Lebresne commented on CASSANDRA-8672:
---------------------------------------------

bq. Its therefor not safe to assume that successive CAS or SERIAL read 
operations will cause a (write-)timeouted CAS operation to get eventually 
applied.

The only case where you can assume that is if the timeout happens during the 
{{commitPaxos}} phase. But we already distinguish timeouts in that case since 
they have a {{WriteType.SIMPLE}}.

bq. However, as theres a chance that the timeout occurred while writing the 
actual CAS operation, another write could potentially complete it and our CAS 
condition will get a different result upon retry.

Yes, that *is* a problem you have to deal with at the application level. But no 
amount of disambiguation of {{WriteTimeoutException}} will remove this: if 
there is a timeout during the "propose" phase, we just don't know if the write 
will be eventually applied or not. In other words, in many cases, the retry 
mechanism that you'll have to implement client side will involve a SERIAL read 
to figure out if the write was applied or not.

bq. I think the best option is to add a new {{WriteType.CAS_PREPARE}}

We could add a different {{WriteType}} if the timeout is during the prepare 
phase, and client could have an easier time retrying in that case since they 
can assume the update hasn't been applied, but as said above, this would change 
the fact that you will have to handle timeouts during the propose phase the 
hard way. And given that you have to do the latter, I wonder how meaningful it 
is to optimize for the former. I also don't know if that kind of subtle 
difference will still make sense post-CASSANDRA-6246. So I'm not strongly 
opposed, but I do wonder if exposing that kind of subtlety won't confuse users 
more than it will help and/or have a short shelf life (due to CASSANDRA-6246)

> Ambiguous WriteTimeoutException while completing pending CAS commits
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-8672
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8672
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stefan Podkowinski
>            Assignee: Tyler Hobbs
>            Priority: Minor
>              Labels: CAS
>             Fix For: 3.0
>
>
> Any CAS update has a chance to trigger a pending/stalled commit of any 
> previously agreed on CAS update. After completing the pending commit, the CAS 
> operation will resume to execute the actual update and also possibly create a 
> new commit. See StorageProxy.cas()
> Theres two possbile execution paths that might end up throwing a 
> WriteTimeoutException:
> cas() -> beginAndRepairPaxos() -> commitPaxos()
> cas() -> commitPaxos()
> Unfortunatelly clients catching a WriteTimeoutException won't be able to tell 
> at which stage the commit failed. My guess would be that most developers are 
> not aware that the beginAndRepairPaxos() could also trigger a write and 
> assume that write timeouts would refer to a timeout while writting the actual 
> CAS update. Its therefor not safe to assume that successive CAS or SERIAL 
> read operations will cause a (write-)timeouted CAS operation to get 
> eventually applied. Although some [best-practices 
> advise|http://www.datastax.com/dev/blog/cassandra-error-handling-done-right] 
> claims otherwise.
> At this point the safest bet is possibly to retry the complete business 
> transaction in case of an WriteTimeoutException. However, as theres a chance 
> that the timeout occurred while writing the actual CAS operation, another 
> write could potentially complete it and our CAS condition will get a 
> different result upon retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to