[
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531020#comment-16531020
]
Marcus Eriksson commented on CASSANDRA-10726:
---------------------------------------------
this looks good to me in general, a few minor comments below
* Update {{StorageProxy#fetchRows}} comment or maybe add inline comments on
each of the calls in that method
* {{private final ReadRepair readRepair}} in {{ReadCallback}} can be removed
since background RR was removed
* In {{BlockingPartitionRepair}}, make the {{pendingRepairs}} {{Map}} a
{{ConcurrentMap}} to indicate that it needs to be concurrent
* {{BlockingPartitionRepair#isLocal}} is not tested, but annotated with
{{@VisibleForTesting}} (could just remove it and call {{CL.isLocal}} in
{{shouldBlockOn}})
* {{BlockingPartitionRepair#sendRR}} is not tested but annotated with
{{@VisibleForTesting}}, make private and remove annotation?
* {{BlockingReadRepair}} {{endpoints}} is not used
* {{BlockingReadRepair}} needs comments on its methods, for example in
{{startRepair}}, why {{digestResolver}} is unused and the difference between
{{allEndpoints}} and {{targetEndpoints}}
* {{BlockingReadRepair#shouldSpeculate}} could be private or add a test as it
is not completely trivial
* {{BlockingReadRepair#maybeSendAdditionalDataRequests}} sends out full data
reads to all non-contacted endpoints (in the local dc if cl is local) - should
we cap this at 1 additional replica? I guess it will most often just be a
single extra replica, but might make sense to limit it for high-rf clusters?
* {{ReadRepair#awaitRepair}}/{{ReadRepair#awaitRepairs}} naming is a bit
confusing, maybe rename {{awaitRepair()}} to {{awaitRepairReads()}} or
something?
* {{ReadRepair}} interface needs comments on all methods
* Maybe throw {{UnsupportedOperationException}} in
{{PartitionRangeReadCommand#getReplicaToken}} instead since we never call
{{sendAdditionalDataRequests}} when doing {{PartitionRangeReadCommand}}. An
option could perhaps be to pass in the key to
{{BlockingReadRepair#startRepair}} and keep it in {{BRR.DigestRepair}} and
remove {{getReplicaToken()}}?
* nice dtests, they need {{@since('4.0')}} and a few have empty docstrings
> Read repair inserts should not be blocking
> ------------------------------------------
>
> Key: CASSANDRA-10726
> URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
> Project: Cassandra
> Issue Type: Improvement
> Components: Coordination
> Reporter: Richard Low
> Assignee: Blake Eggleston
> Priority: Major
> Fix For: 4.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert
> to update out of date replicas is blocking. This means, if it fails, the read
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or
> the mutation stage is backed up for some other reason), all reads to a
> replica set could fail. Further, replicas dropping writes get more out of
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not
> be blocking or we should return success for the read even if the write times
> out.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]