[
https://issues.apache.org/jira/browse/CASSANDRA-15442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16989238#comment-16989238
]
Yifan Cai commented on CASSANDRA-15442:
---------------------------------------
[~bdeggleston], thanks!
The patch is ready.
||Code||PR||Unit test||JVM dtest||
|[Code|https://github.com/yifan-c/cassandra/tree/CASSANDRA-15442-read-repair-timeout-fix]|[PR|https://github.com/apache/cassandra/pull/391]|[Unit
test|https://app.circleci.com/jobs/github/yifan-c/cassandra/113]|[JVM
dtesrt|https://app.circleci.com/jobs/github/yifan-c/cassandra/112]|
Briefly, the patch does
# In the BlockingReadRepair, repair process wait based on the read timeout
value.
# Added awaitRepairsUntil to accept a future time to timeout.
# Added timeout test in dtest.
> Read repair implicitly increases read timeout value
> ---------------------------------------------------
>
> Key: CASSANDRA-15442
> URL: https://issues.apache.org/jira/browse/CASSANDRA-15442
> Project: Cassandra
> Issue Type: Bug
> Components: Legacy/Core
> Reporter: Yifan Cai
> Assignee: Yifan Cai
> Priority: Normal
>
> When read repair occurs during a read, internally, it starts several
> _blocking_ operations in sequence. See
> {{org.apache.cassandra.service.StorageProxy#fetchRows}}.
> The timeline of the blocking operations
> # Regular read, wait for full data/digest read response to complete.
> {{reads[*].awaitResponses();}}
> # Read repair read, wait for full data read response to complete.
> {{reads[*].awaitReadRepair();}}
> # Read repair write, wait for write response to complete.
> {{concatAndBlockOnRepair(results, repairs);}}
> Step 1 and 2 share the same timeout, and wait for the duration of read
> timeout, say 5 s.
> Step 3 waits for the duration of write timeout, say 2 s.
> In the worse case, the actual time taken for a read could accumulate to ~7 s,
> if each individual step does not exceed the timeout value.
> From the client perspective, it may not expect a request taken higher than
> the database configured timeout value.
> Such scenario is especially bad for the clients that have set up client-side
> timeout monitoring close to the configured one. The clients think the
> operations timed out and abort, but they are in fact still running on server.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]