[ https://issues.apache.org/jira/browse/CASSANDRA-10726?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16129843#comment-16129843 ]
Xiaolong Jiang commented on CASSANDRA-10726: -------------------------------------------- [~krummas] we don't want to share results. if there is speculative read retry kicks in, it will mess up and may trigger a background async repair, which actually will do the repair twice together with foreground read repair. We are already doing duplicated repair already before. So I want to two repairs (foreground and background async repair) not to mess up each other. Ideally, I would return the map back as the input for next steps, which we don't have any race at all. However the iterator close thing makes it impossible to pass around value cleanly, that's how I end up with this shared map hack. [~iamaleksey] I wound not refactor the whole read pipeline right now I guess even though I do agree the code becomes so complicated :(. Regarding the 1M rows timeout, I compared with what we did before with I am doing now. It turns out previous code is waiting repair back with write rpc timeout. I was hoping I can make it better, but it turns out I am making it worse. It's better to wait longer instead of returning failure for read. If we can not get result even after waiting longer, the client will get timeout anyway. Thus I changed the repair wait time out same as before. and I also ran the stress test with 1M rows by shutdown node3 when writing and then read with cl=ALL to force read repair. It's looking to me, no read timeout now. I pushed the fix to same PR I sent previously. Could you please check again? > Read repair inserts should not be blocking > ------------------------------------------ > > Key: CASSANDRA-10726 > URL: https://issues.apache.org/jira/browse/CASSANDRA-10726 > Project: Cassandra > Issue Type: Improvement > Components: Coordination > Reporter: Richard Low > Assignee: Xiaolong Jiang > Fix For: 4.x > > > Today, if there’s a digest mismatch in a foreground read repair, the insert > to update out of date replicas is blocking. This means, if it fails, the read > fails with a timeout. If a node is dropping writes (maybe it is overloaded or > the mutation stage is backed up for some other reason), all reads to a > replica set could fail. Further, replicas dropping writes get more out of > sync so will require more read repair. > The comment on the code for why the writes are blocking is: > {code} > // wait for the repair writes to be acknowledged, to minimize impact on any > replica that's > // behind on writes in case the out-of-sync row is read multiple times in > quick succession > {code} > but the bad side effect is that reads timeout. Either the writes should not > be blocking or we should return success for the read even if the write times > out. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org