[
https://issues.apache.org/jira/browse/CASSANDRA-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16992941#comment-16992941
]
Jordan West commented on CASSANDRA-15442:
-----------------------------------------
Thanks for the patch [~yifanc]. I've actually wondered about it for a while but
never thought of it as a bug (assumed it was by design). Glad to see it
addressed. A few minor nits, all optional:
* I agree that we should only use the repair timeout for the reason you
mentioned. I find it confusing that the write timeout is involved since the
client has no idea if read repair will occur when submitting the read.
* Since `awaitRepairs` is only called from `awaitRepairsUntil` consider
merging the two functions. It might make the code more clear and require less
jumping around.
* I agree with [~bdeggleston] that setting the write timeout very high in your
test can show how the change ensures independence from it – regardless of any
other code changes. It may also give you a more flexible way to write the
assertion at the end of `readRepairTimeoutTest()`.
* Additionally, re: the two `Assert.assertTrue`s at the end of
`readRepairTimeoutTest`: consider adding failure messages to make them more
descriptive, as well as collapsing it in to one expression (e.g
`actualTimeTaken < reducedReadTimeout + magicDelayAmount && actualTimeTaken >
reducedReadTimeout`).
> Read repair implicitly increases read timeout value
> ---------------------------------------------------
>
> Key: CASSANDRA-15442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15442
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Core
> Reporter: Yifan Cai
> Assignee: Yifan Cai
> Priority: Normal
>
> When read repair occurs during a read, internally, it starts several
> _blocking_ operations in sequence. See
> {{org.apache.cassandra.service.StorageProxy#fetchRows}}.
> The timeline of the blocking operations
> # Regular read, wait for full data/digest read response to complete.
> {{reads[*].awaitResponses();}}
> # Read repair read, wait for full data read response to complete.
> {{reads[*].awaitReadRepair();}}
> # Read repair write, wait for write response to complete.
> {{concatAndBlockOnRepair(results, repairs);}}
> Step 1 and 2 share the same timeout, and wait for the duration of read
> timeout, say 5 s.
> Step 3 waits for the duration of write timeout, say 2 s.
> In the worse case, the actual time taken for a read could accumulate to ~7 s,
> if each individual step does not exceed the timeout value.
> From the client perspective, it may not expect a request taken higher than
> the database configured timeout value.
> Such scenario is especially bad for the clients that have set up client-side
> timeout monitoring close to the configured one. The clients think the
> operations timed out and abort, but they are in fact still running on server.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]