[ 
https://issues.apache.org/jira/browse/CASSANDRA-8672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14390979#comment-14390979
 ] 

Stefan Podkowinski commented on CASSANDRA-8672:
-----------------------------------------------

CAS_PREPARE and the whole ticket is not be about splitting the CAS case. The 
point is that currently I can get a WriteTimeout SIMPLE at two different points 
during execution:

cas() -> beginAndRepairPaxos() -> commitPaxos()
cas() -> commitPaxos()

I've double checked and can't really see that the beginAndRepairPaxos() would 
somehow catch and wrap a SIMPLE timeout from commitPaxos() to wrap it into a 
CAS timeout. But this is what IMO should be the preferred way to deal with a 
SIMPLE timeout at the beginAndRepairPaxos() phase. Else the caller would assume 
that the SIMPLE timeout was caused by his own (now accepted) cas operation and 
not the previous commited and resumed operation. 

Basically the behaviour in the "CAS operations" section of the [error-handling 
blog|http://www.datastax.com/dev/blog/cassandra-error-handling-done-right] 
describes how the error handling is suppose to work pretty well. But currently 
the implementation just makes the described individual handling of the paxos 
and commit phase not possible, since the WriteType.SIMPLE exception is 
ambiguous and can happen in both paxos and commit phase. 


> Ambiguous WriteTimeoutException while completing pending CAS commits
> --------------------------------------------------------------------
>
>                 Key: CASSANDRA-8672
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-8672
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Stefan Podkowinski
>            Assignee: Tyler Hobbs
>            Priority: Minor
>              Labels: CAS
>             Fix For: 3.0
>
>
> Any CAS update has a chance to trigger a pending/stalled commit of any 
> previously agreed on CAS update. After completing the pending commit, the CAS 
> operation will resume to execute the actual update and also possibly create a 
> new commit. See StorageProxy.cas()
> Theres two possbile execution paths that might end up throwing a 
> WriteTimeoutException:
> cas() -> beginAndRepairPaxos() -> commitPaxos()
> cas() -> commitPaxos()
> Unfortunatelly clients catching a WriteTimeoutException won't be able to tell 
> at which stage the commit failed. My guess would be that most developers are 
> not aware that the beginAndRepairPaxos() could also trigger a write and 
> assume that write timeouts would refer to a timeout while writting the actual 
> CAS update. Its therefor not safe to assume that successive CAS or SERIAL 
> read operations will cause a (write-)timeouted CAS operation to get 
> eventually applied. Although some [best-practices 
> advise|http://www.datastax.com/dev/blog/cassandra-error-handling-done-right] 
> claims otherwise.
> At this point the safest bet is possibly to retry the complete business 
> transaction in case of an WriteTimeoutException. However, as theres a chance 
> that the timeout occurred while writing the actual CAS operation, another 
> write could potentially complete it and our CAS condition will get a 
> different result upon retry.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to