[ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17049745#comment-17049745
 ] 

Jeff Jirsa commented on CASSANDRA-10726:
----------------------------------------

[~jjordan]:

bq. In 3.11 you should set both read_repair_chance and 
dc_local_read_repair_chance to 0 on all your tables.

That's not what this is about - this is in foreground read repair where one of 
the queried replicas doesn't match.

bq. If such things are common on your cluster, then you will want to 
investigate the performance issues further and try to resolve them. This change 
means your application will no longer be trying to wait for the replicas to be 
up to date before the read returns

It helps when disks are failing, and it helps when GC is high. Both of those 
things happen in the real world, and this patch turns a failing machine into a 
non-issue.

That said - backporting this patch is almost certainly not going to happen. 
It's a large patch in a critical piece of code and it's not suitable for a 
stable branch like 3.11. It's a meaningful, important patch, but probably 
safest to wait for 3.11



> Read repair inserts should not be blocking
> ------------------------------------------
>
>                 Key: CASSANDRA-10726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Legacy/Coordination
>            Reporter: Richard Low
>            Assignee: Blake Eggleston
>            Priority: Normal
>             Fix For: 4.0
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to