[
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15093872#comment-15093872
]
Sylvain Lebresne commented on CASSANDRA-10726:
----------------------------------------------
bq. Would a reasonable half way house be to keep the write as blocking but
return success in the case of a write timeout?
That would still break the "monotonic quorum reads": unless you get positive
acks from the read-repair, you can't guarantee a quorum of replica is now up to
date. Granted, it will work more often if we do that (than if we don't block at
all), but guarantees are not about "most of the time" :)
And just to recap my personal position on this, I do feel we should keep the
guarantee, at least by default, and still feel the right way to deal with the
scenario you're complaining about would be a better way to deal with nodes
backing up on writes. But we all know it's easier said than fixed, and while
I'd rather we spend time on that better way to deal with the 2 scenario
[~jbellis] mentioned above, I'm not too strongly opposed to a -D stopgap for
advanced users.
> Read repair inserts should not be blocking
> ------------------------------------------
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
> Issue Type: Improvement
> Components: Coordination
> Reporter: Richard Low
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert
> to update out of date replicas is blocking. This means, if it fails, the read
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or
> the mutation stage is backed up for some other reason), all reads to a
> replica set could fail. Further, replicas dropping writes get more out of
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not
> be blocking or we should return success for the read even if the write times
> out.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)