[ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Blake Eggleston updated CASSANDRA-10726:
----------------------------------------
    Status: Patch Available  (was: Awaiting Feedback)

[trunk|https://github.com/bdeggleston/cassandra/tree/10726-v2]
 [dtests|https://github.com/bdeggleston/cassandra-dtest/tree/10726]
 [tests|https://circleci.com/workflow-run/7c271901-a224-4326-bb32-cd75f218ce96]

The patch makes these 2 changes to read repair behavior.

After a digest mismatch, data requests are sent to all participants in the 
original request, but only CL.blockFor responses are required to proceed (used 
to be CL.ALL, which would be 3/3 if we speculated). The followup data read will 
now also speculatively read from another replica if it's looking like one may 
not respond, and another is available.

When sending repair mutations, we now only block on CL.blockFor acks (used to 
be CL.ALL). We will now also speculatively send a repair mutation to an 
additional node with the contents of all unacked mutations if it looks like one 
may not respond.

(C* branch is written on top of CASSANDRA-14353, so that's a dependency, but 
should get committed soon)

> Read repair inserts should not be blocking
> ------------------------------------------
>
>                 Key: CASSANDRA-10726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Richard Low
>            Assignee: Blake Eggleston
>            Priority: Major
>             Fix For: 4.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to