Caleb Rackliffe created CASSANDRA-16807:
-------------------------------------------

             Summary: Weak visibility guarantees of Accumulator lead to failed 
assertions during digest comparison
                 Key: CASSANDRA-16807
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-16807
             Project: Cassandra
          Issue Type: Bug
          Components: Consistency/Coordination
            Reporter: Caleb Rackliffe
            Assignee: Caleb Rackliffe


This problem could manifest on all versions, beginning on at least 3.0, but 
I’ll focus on the way it manifests in 4.0 here.

In what now seems like a wise move, CASSANDRA-16097 added an assertion to 
{{DigestResolver#responseMatch()}} that ensures the responses snapshot has at 
least one visible elements to compare (although of course only one element 
trivially cannot generate a mismatch and short-circuits immediately). However, 
at the point {{ReadCallback#onResponse()}} signals the waiting resolver, there 
is no guarantee that the size of the generated snapshot of the responses 
{{Accumulator}} is non-zero, or perhaps more worryingly, at least equal to the 
number of blocked-for responses. This seems to be a consequence of the 
documented weak visibility guarantees on {{Accumulator#add()}}. In short, if 
there are concurrent invocations of add(), is it not guaranteed that there is 
any visible size change after any one of them return, but only after all 
complete.

The particular exception looks something like this:

{noformat}
java.lang.AssertionError: Attempted response match comparison while no 
responses have been received.
        at 
org.apache.cassandra.service.reads.DigestResolver.responsesMatch(DigestResolver.java:110)
        at 
org.apache.cassandra.service.reads.AbstractReadExecutor.awaitResponses(AbstractReadExecutor.java:393)
        at 
org.apache.cassandra.service.StorageProxy.fetchRows(StorageProxy.java:2150)
        at 
org.apache.cassandra.service.StorageProxy.readRegular(StorageProxy.java:1979)
        at 
org.apache.cassandra.service.StorageProxy.read(StorageProxy.java:1882)
        at 
org.apache.cassandra.db.SinglePartitionReadCommand$Group.execute(SinglePartitionReadCommand.java:1121)
        at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:296)
        at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:248)
        at 
org.apache.cassandra.cql3.statements.SelectStatement.execute(SelectStatement.java:90)
{noformat}

It’s possible to reproduce this on simple single-partition reads without any 
short-read protection or replica filtering protection. I’ve also been able to 
reproduce this synthetically with a unit test on {{ReadCallback}}.

It seems like the most straightforward way to fix this would be to avoid 
signaling in {{ReadCallback#onResponse()}} until the visible size of the 
accumulator is at least the number of received responses. In most cases, this 
is trivially true, and our signaling behavior won’t change at all. In the very 
rare case that there are two (or more) concurrent calls to {{onResponse()}}, 
the second (or last) will signal, and having one more response than we strictly 
need should have no negative side effects. (We don’t seem to make any strict 
assertions about having exactly the number of required responses, only that we 
have enough.)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to