[
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15068248#comment-15068248
]
Jonathan Ellis commented on CASSANDRA-10726:
--------------------------------------------
Seeing reads "go backwards in time" is one of the most confusing aspects of
eventual consistency for people, so I do think it's important that quorum reads
avoid that, even more so because users tend to oversimplify quorum reads as
"strong consistency that means I don't have to think about EC." So to the
degree we can make that assumption true, we should, especially if that's been
our behavior already for 4+ years.
It seems like there are two primary problem scenarios:
* When a node is overloaded for writes, this stops reads as well. First,
delaying reads when we're behind on writes is arguably a good thing that will
help you recover faster. Second, the right way to tackle this is with better
handling of the write overload as in CASANDRA-9318.
* When data is read-only because disks are failing. I agree with Sylvain that
half-broken is often worse than completely broken, and in this specific case if
a disk puts itself in read-only mode then it won't be long until it isn't
readable either. This is another case where "mark a disk bad and broadcast to
other nodes not to send me requests for tokens pinned to it" as envisioned in
CASSANDRA-6696 would be useful, along with an option for "promote write errors
to blacklist on reads as wells."
> Read repair inserts should not be blocking
> ------------------------------------------
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
> Issue Type: Improvement
> Components: Coordination
> Reporter: Richard Low
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert
> to update out of date replicas is blocking. This means, if it fails, the read
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or
> the mutation stage is backed up for some other reason), all reads to a
> replica set could fail. Further, replicas dropping writes get more out of
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not
> be blocking or we should return success for the read even if the write times
> out.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)