Marcus Olsson created CASSANDRA-14983:
-----------------------------------------

             Summary: Local reads potentially blocking remote reads
                 Key: CASSANDRA-14983
                 URL: https://issues.apache.org/jira/browse/CASSANDRA-14983
             Project: Cassandra
          Issue Type: Bug
          Components: Consistency/Coordination
            Reporter: Marcus Olsson
         Attachments: local_read_trace.log

Since CASSANDRA-4718 there is a fast path allowing local requests to continue 
to [work in the same 
thread|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java#L157]
 rather than being sent over to the read stage.

Based on the comment
{code:java}
// We delay the local (potentially blocking) read till the end to avoid 
stalling remote requests.
{code}
it seems like this should be performed last in the chain to avoid blocking 
remote requests but that does not seem to be the case when the local request is 
a data request. The digest request(s) are sent after the data requests are sent 
(and now the transient replica requests as well). When the fast path is used 
for local data/transient data requests this will block the next type of request 
from being sent away until the local read is finished and add additional 
latency to the request.

In addition to this it seems like local requests are *always* data requests 
(might not be a problem), but the log message can say either ["digest" or 
"data"|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java#L156]
 as the type of request.

I have tried to run performance measurements to see the impact of this in 3.0 
(by moving local requests to the end of ARE#executeAsync()) but I haven't seen 
any big difference yet. I'll continue to run some more tests to see if I can 
find a use case affected by this.

Attaching a trace (3.0) where this happens. Reproduction:
 # Create a three node CCM cluster
 # Provision data with stress (rf=3)
 # In parallel:
 ## Start stress read run
 ## Run multiple manual read queries in cqlsh with tracing on and local_quorum 
(as this does not always happen)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to