[
https://issues.apache.org/jira/browse/CASSANDRA-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Benedict reassigned CASSANDRA-14983:
------------------------------------
Assignee: Marcus Olsson
Reviewers: Benedict
> Local reads potentially blocking remote reads
> ---------------------------------------------
>
> Key: CASSANDRA-14983
> URL: https://issues.apache.org/jira/browse/CASSANDRA-14983
> Project: Cassandra
> Issue Type: Bug
> Components: Consistency/Coordination
> Reporter: Marcus Olsson
> Assignee: Marcus Olsson
> Priority: Low
> Attachments: graph_local_read.html, graph_local_read_trunk.html,
> local_read_trace.log
>
>
> Since CASSANDRA-4718 there is a fast path allowing local requests to continue
> to [work in the same
> thread|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java#L157]
> rather than being sent over to the read stage.
> Based on the comment
> {code:java}
> // We delay the local (potentially blocking) read till the end to avoid
> stalling remote requests.
> {code}
> it seems like this should be performed last in the chain to avoid blocking
> remote requests but that does not seem to be the case when the local request
> is a data request. The digest request(s) are sent after the data requests are
> sent (and now the transient replica requests as well). When the fast path is
> used for local data/transient data requests this will block the next type of
> request from being sent away until the local read is finished and add
> additional latency to the request.
> In addition to this it seems like local requests are *always* data requests
> (might not be a problem), but the log message can say either ["digest" or
> "data"|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java#L156]
> as the type of request.
> I have tried to run performance measurements to see the impact of this in 3.0
> (by moving local requests to the end of ARE#executeAsync()) but I haven't
> seen any big difference yet. I'll continue to run some more tests to see if I
> can find a use case affected by this.
> Attaching a trace (3.0) where this happens. Reproduction:
> # Create a three node CCM cluster
> # Provision data with stress (rf=3)
> # In parallel:
> ## Start stress read run
> ## Run multiple manual read queries in cqlsh with tracing on and
> local_quorum (as this does not always happen)
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]