[jira] [Comment Edited] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive

Christian Spriegel (JIRA) Wed, 30 May 2018 07:56:26 -0700


    [ 
https://issues.apache.org/jira/browse/CASSANDRA-14480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16495254#comment-16495254
 ]


Christian Spriegel edited comment on CASSANDRA-14480 at 5/30/18 2:55 PM:
-------------------------------------------------------------------------

I did some more testing and tried the following change in 
StorageProxy.SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch():
{code:java}
repairHandler = new ReadCallback(resolver,
ConsistencyLevel.ALL,
consistency.blockFor(keyspace), // was: executor.getContactedReplicas().size()
command,
keyspace,
executor.handler.endpoints);{code}
This fixed the issue in my test-scenario. But it causes the read-repair to only 
repair to only repair 2 our of my 3 replicas, in cases where all 3 replicas 
would be available.

 

I could imagine an alternative solution where maybeAwaitFullDataRead() would 
wait for 3 replicas, but in case of an RTE it could check if 2 responded and 
treat that as a successful read.


was (Author: christianmovi):
I did some more testing and tried the following change in 
StorageProxy.SinglePartitionReadLifecycle.awaitResultsAndRetryOnDigestMismatch():
{code:java}
repairHandler = new ReadCallback(resolver,
ConsistencyLevel.ALL,
consistency.blockFor(keyspace), // was: executor.getContactedReplicas().size()
command,
keyspace,
executor.handler.endpoints);{code}
This fixed the issue in my test-scenario. But it causes the read-repair to only 
repair to only repair 2 our of my 3 replicas, in cases where all 3 replicas 
would be available.

> Digest mismatch requires all replicas to be responsive
> ------------------------------------------------------
>
>                 Key: CASSANDRA-14480
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14480
>             Project: Cassandra
>          Issue Type: Bug
>          Components: Core
>            Reporter: Christian Spriegel
>            Priority: Major
>         Attachments: Reader.java, Writer.java, schema_14480.cql
>
>
> I ran across a scenario where a digest mismatch causes a read-repair that 
> requires all up nodes to be able to respond. If one of these nodes is not 
> responding, then the read-repair is being reported to the client as 
> ReadTimeoutException.
>  
> My expection would be that a CL=QUORUM will always succeed as long as 2 nodes 
> are responding. But unfortunetaly the third node being "up" in the ring, but 
> not being able to respond does lead to a RTE.
>  
>  
> I came up with a scenario that reproduces the issue:
>  # set up a 3 node cluster using ccm
>  # increase the phi_convict_threshold to 16, so that nodes are permanently 
> reported as up
>  # create attached schema
>  # run attached reader&writer (which only connects to node1&2). This should 
> already produce digest mismatches
>  # do a "ccm node3 pause"
>  # The reader will report a read-timeout with consistency QUORUM (2 responses 
> were required but only 1 replica responded). Within the 
> DigestMismatchException catch-block it can be seen that the repairHandler is 
> waiting for 3 responses, even though the exception says that 2 responses are 
> required.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (CASSANDRA-14480) Digest mismatch requires all replicas to be responsive

Reply via email to