[ 
https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129843#comment-16129843
 ] 

Xiaolong Jiang commented on CASSANDRA-10726:
--------------------------------------------

[~krummas] we don't want to share results. if there is speculative read retry 
kicks in, it will mess up and may trigger a background async repair, which 
actually will do the repair twice together with foreground read repair. We are 
already doing duplicated repair already before. 
So I want to two repairs (foreground and background async repair) not to mess 
up each other. Ideally, I would return the map back as the input for next 
steps, which we don't have any race at all. However the iterator close thing 
makes it impossible to pass around value cleanly, that's how I end up with this 
shared map hack. [~iamaleksey] I wound not refactor the whole read pipeline 
right now I guess even though I do agree the code becomes so complicated :(.  
Regarding the 1M rows timeout, I compared with what we did before with I am 
doing now. It turns out previous code is waiting repair back with write rpc 
timeout. I was hoping I can make it better, but it turns out I am making it 
worse. It's better to wait longer instead of returning failure for read. If we 
can not get result even after waiting longer, the client will get timeout 
anyway.  

Thus I changed the repair wait time out same as before. and I also ran the 
stress test with 1M rows by shutdown node3 when writing and then read with 
cl=ALL to force read repair. It's looking to me, no read timeout now. I pushed 
the fix to same PR I sent previously. Could you please check again?

> Read repair inserts should not be blocking
> ------------------------------------------
>
>                 Key: CASSANDRA-10726
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-10726
>             Project: Cassandra
>          Issue Type: Improvement
>          Components: Coordination
>            Reporter: Richard Low
>            Assignee: Xiaolong Jiang
>             Fix For: 4.x
>
>
> Today, if there’s a digest mismatch in a foreground read repair, the insert 
> to update out of date replicas is blocking. This means, if it fails, the read 
> fails with a timeout. If a node is dropping writes (maybe it is overloaded or 
> the mutation stage is backed up for some other reason), all reads to a 
> replica set could fail. Further, replicas dropping writes get more out of 
> sync so will require more read repair.
> The comment on the code for why the writes are blocking is:
> {code}
> // wait for the repair writes to be acknowledged, to minimize impact on any 
> replica that's
> // behind on writes in case the out-of-sync row is read multiple times in 
> quick succession
> {code}
> but the bad side effect is that reads timeout. Either the writes should not 
> be blocking or we should return success for the read even if the write times 
> out.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org
For additional commands, e-mail: commits-h...@cassandra.apache.org

Reply via email to