[
https://issues.apache.org/jira/browse/CASSANDRA-12126?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17227354#comment-17227354
]
Benedict Elliott Smith commented on CASSANDRA-12126:
----------------------------------------------------
To some extent that is all up for debate.
My plan so far has been to avoid interfering with 4.0 release, so I have been
working towards targeting 4.x. This would also permit time to produce
documentation and reach out to the list to begin the slow handshake to see if
the project wants the work, and in what manner. However, the main body of work
is essentially complete, so it is possible that this could be brought forwards
if there were appetite.
As to target version, it would be possible to target 3.0+, at least for a
portion of the work that would encompass this issue, without a great deal of
work. The project's appetite would be the main decider, as it's a significant
body of work.
The main contribution would be a parallel implementation of the same
underlying Paxos algorithm, that is able to run concurrently alongside it
(supporting live migration), but with several latency improvements, as well as
several fixes to correctness. Alongside this is related work to guarantee
linearizability across range movements in the form of modifications to repair,
bootstrap, replace etc.
Related to this work are several patches to wider Cassandra to support
automated verification of its correctness, by permitting deterministic
simulation of Cassandra clusters with adversarial ordering of events. We have
so far simulated billions of transactions to verify its linearizability. I
anticipate that this work will be useful for the project's overall goal of
improving quality, but they are themselves quite significant and will require
their own discussions around timeline and scope.
> CAS Reads Inconsistencies
> --------------------------
>
> Key: CASSANDRA-12126
> URL: https://issues.apache.org/jira/browse/CASSANDRA-12126
> Project: Cassandra
> Issue Type: Bug
> Components: Feature/Lightweight Transactions, Legacy/Coordination
> Reporter: Sankalp Kohli
> Assignee: Sylvain Lebresne
> Priority: Normal
> Labels: LWT, pull-request-available
> Fix For: 3.0.x, 3.11.x, 4.0-beta
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> While looking at the CAS code in Cassandra, I found a potential issue with
> CAS Reads. Here is how it can happen with RF=3
> 1) You issue a CAS Write and it fails in the propose phase. A machine replies
> true to a propose and saves the commit in accepted filed. The other two
> machines B and C does not get to the accept phase.
> Current state is that machine A has this commit in paxos table as accepted
> but not committed and B and C does not.
> 2) Issue a CAS Read and it goes to only B and C. You wont be able to read the
> value written in step 1. This step is as if nothing is inflight.
> 3) Issue another CAS Read and it goes to A and B. Now we will discover that
> there is something inflight from A and will propose and commit it with the
> current ballot. Now we can read the value written in step 1 as part of this
> CAS read.
> If we skip step 3 and instead run step 4, we will never learn about value
> written in step 1.
> 4. Issue a CAS Write and it involves only B and C. This will succeed and
> commit a different value than step 1. Step 1 value will never be seen again
> and was never seen before.
> If you read the Lamport “paxos made simple” paper and read section 2.3. It
> talks about this issue which is how learners can find out if majority of the
> acceptors have accepted the proposal.
> In step 3, it is correct that we propose the value again since we dont know
> if it was accepted by majority of acceptors. When we ask majority of
> acceptors, and more than one acceptors but not majority has something in
> flight, we have no way of knowing if it is accepted by majority of acceptors.
> So this behavior is correct.
> However we need to fix step 2, since it caused reads to not be linearizable
> with respect to writes and other reads. In this case, we know that majority
> of acceptors have no inflight commit which means we have majority that
> nothing was accepted by majority. I think we should run a propose step here
> with empty commit and that will cause write written in step 1 to not be
> visible ever after.
> With this fix, we will either see data written in step 1 on next serial read
> or will never see it which is what we want.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]