[ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16531020#comment-16531020
 ] 

Marcus Eriksson commented on CASSANDRA-10726:
---------------------------------------------

this looks good to me in general, a few minor comments below

* Update {{StorageProxy#fetchRows}} comment or maybe add inline comments on 
each of the calls in that method
* {{private final ReadRepair readRepair}} in {{ReadCallback}} can be removed 
since background RR was removed
* In {{BlockingPartitionRepair}}, make the {{pendingRepairs}} {{Map}} a 
{{ConcurrentMap}} to indicate that it needs to be concurrent
* {{BlockingPartitionRepair#isLocal}} is not tested, but annotated with 
{{@VisibleForTesting}} (could just remove it and call {{CL.isLocal}} in 
{{shouldBlockOn}})
* {{BlockingPartitionRepair#sendRR}} is not tested but annotated with 
{{@VisibleForTesting}}, make private and remove annotation?
* {{BlockingReadRepair}} {{endpoints}} is not used
* {{BlockingReadRepair}} needs comments on its methods, for example in 
{{startRepair}}, why {{digestResolver}} is unused and the difference between 
{{allEndpoints}} and {{targetEndpoints}}
* {{BlockingReadRepair#shouldSpeculate}} could be private or add a test as it 
is not completely trivial
* {{BlockingReadRepair#maybeSendAdditionalDataRequests}} sends out full data 
reads to all non-contacted endpoints (in the local dc if cl is local) - should 
we cap this at 1 additional replica? I guess it will most often just be a 
single extra replica, but might make sense to limit it for high-rf clusters?
* {{ReadRepair#awaitRepair}}/{{ReadRepair#awaitRepairs}} naming is a bit 
confusing, maybe rename {{awaitRepair()}} to {{awaitRepairReads()}} or 
something?
* {{ReadRepair}} interface needs comments on all methods
* Maybe throw {{UnsupportedOperationException}} in 
{{PartitionRangeReadCommand#getReplicaToken}} instead since we never call 
{{sendAdditionalDataRequests}} when doing {{PartitionRangeReadCommand}}. An 
option could perhaps be to pass in the key to 
{{BlockingReadRepair#startRepair}} and keep it in {{BRR.DigestRepair}} and 
remove {{getReplicaToken()}}?

* nice dtests, they need {{@since('4.0')}} and a few have empty docstrings

> Read repair inserts should not be blocking
> ------------------------------------------
>
>                 Key: CASSANDRA-10726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Richard Low
>            Assignee: Blake Eggleston
>            Priority: Major
>             Fix For: 4.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to