[
https://issues.apache.org/jira/browse/CASSANDRA-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Caleb Rackliffe updated CASSANDRA-16807:
----------------------------------------
Bug Category: Parent values: Correctness(12982)Level 1 values: Transient
Incorrect Response(12987)
Complexity: Normal
Discovered By: Fuzz Test
Fix Version/s: 4.x
4.0.x
Severity: Critical
Status: Open (was: Triage Needed)
There is perhaps a question of whether to fix this in 3.x, but the exact nature
of the fix may be a little different, given the refactoring that happened in
4.0. Perhaps a bigger question than that is whether this warrants inclusion in
4.0.0, but given it probably isn't entirely a regression from 3.x (i.e. the
assertion is new in 4.0.0 but not the underlying issue), I could see a case for
it not blocking 4.0 GA.
> Weak visibility guarantees of Accumulator lead to failed assertions during
> digest comparison
> --------------------------------------------------------------------------------------------
>
> Key: CASSANDRA-16807
> URL: https://issues.apache.org/jira/browse/CASSANDRA-16807
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Caleb Rackliffe
> Assignee: Caleb Rackliffe
> Priority: Normal
> Fix For: 4.0.x, 4.x
>
>
> This problem could manifest on all versions, beginning on at least 3.0, but
> I’ll focus on the way it manifests in 4.0 here.
> In what now seems like a wise move, CASSANDRA-16097 added an assertion to
> {{DigestResolver#responseMatch()}} that ensures the responses snapshot has at
> least one visible elements to compare (although of course only one element
> trivially cannot generate a mismatch and short-circuits immediately).
> However, at the point {{ReadCallback#onResponse()}} signals the waiting
> resolver, there is no guarantee that the size of the generated snapshot of
> the responses {{Accumulator}} is non-zero, or perhaps more worryingly, at
> least equal to the number of blocked-for responses. This seems to be a
> consequence of the documented weak visibility guarantees on
> {{Accumulator#add()}}. In short, if there are concurrent invocations of
> add(), is it not guaranteed that there is any visible size change after any
> one of them return, but only after all complete.
> The particular exception looks something like this:
> {noformat}
> java.lang.AssertionError: Attempted response match comparison while no
> responses have been received.
> at
> org.apache.cassandra.service.reads.DigestResolver.responsesMatch(DigestResolver.java:110)
> at
> org.apache.cassandra.service.reads.AbstractReadExecutor.awaitResponses(AbstractReadExecutor.java:393)
> at
> org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:2150)
> at
> org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1979)
> at
> org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1882)
> at
> org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1121)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:296)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:248)
> at
> org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:90)
> {noformat}
> It’s possible to reproduce this on simple single-partition reads without any
> short-read protection or replica filtering protection. I’ve also been able to
> reproduce this synthetically with a unit test on {{ReadCallback}}.
> It seems like the most straightforward way to fix this would be to avoid
> signaling in {{ReadCallback#onResponse()}} until the visible size of the
> accumulator is at least the number of received responses. In most cases, this
> is trivially true, and our signaling behavior won’t change at all. In the
> very rare case that there are two (or more) concurrent calls to
> {{onResponse()}}, the second (or last) will signal, and having one more
> response than we strictly need should have no negative side effects. (We
> don’t seem to make any strict assertions about having exactly the number of
> required responses, only that we have enough.)
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]