[
https://issues.apache.org/jira/browse/CASSANDRA-2494?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13020867#comment-13020867
]
Peter Schuller commented on CASSANDRA-2494:
-------------------------------------------
As far as I can tell the consistency being asked for was never promised by
Cassandra is in fact not expected.
The expected behavior of writes is that they propagate; the difference between
ONE and QUORUM is just how many are required to receive a write prior to a
return to the client with a successful error code. For reads, that means you
may get lucky at ONE or you may get lucky at QUORUM; the positive guarantee is
in the case of a *completing* QUORUM write followed by a QUORUM read.
So just to be clear, although I don't think this is what is being asked for: As
far as I know, it has never been the case, nor the intent to promise, that a
write which fails is guaranteed not to eventually complete. Simply "fixing"
reads is not enough; by design the data will be replicated during read-repair
and AES - this is how consistency is achieved in Cassandra.
However, it sounds like what is being asked for is not that they don't
propagate in the event of a write "failure", but just that reads don't see the
writes until they are sufficiently propagated to guarantee that any future
QUORUM read will also see the data. I can understand that is desirable, in the
sense of achieving monotonically forward-moving data as the benchmark/test from
the e-mail thread does. Another way to look at is that maybe you never want to
read data successfully prior to achieving a certain level of replication, in
order to avoid a client ever seeing data that may suddenly go away due to e.g.
a node failure in spite of said failure not exceeding the number of failures
the cluster was designed to survive.
So the key point would be the bit about guaranteeing that any "future QUORUM
read will see the data or data subsequently overwritten", and actively
read-repairing and waiting for it to happen would take care of that. It would
be important to ensure that the act of ensuring a quorum of nodes have seen the
data is the important part; one should not await for a quorum to agree on the
*current* version of the data as that would create potentially unbounded
round-trips on hotly contended data.
Thing to consider: One might think about cases where read-repair is currently
not done, like range slices, and how an implementation that requires read
repair for consistency affects that.
> Quorum reads are not consistent
> -------------------------------
>
> Key: CASSANDRA-2494
> URL: https://issues.apache.org/jira/browse/CASSANDRA-2494
> Project: Cassandra
> Issue Type: Bug
> Reporter: Sean Bridges
>
> As discussed in this thread,
> http://www.mail-archive.com/[email protected]/msg12421.html
> Quorum reads should be consistent. Assume we have a cluster of 3 nodes
> (X,Y,Z) and a replication factor of 3. If a write of N is committed to X, but
> not Y and Z, then a read from X should not return N unless the read is
> committed to at least two nodes. To ensure this, a read from X should wait
> for an ack of the read repair write from either Y or Z before returning.
> Are there system tests for cassandra? If so, there should be a test similar
> to the original post in the email thread. One thread should write 1,2,3...
> at consistency level ONE. Another thread should read at consistency level
> QUORUM from a random host, and verify that each read is >= the last read.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira