[ https://issues.apache.org/jira/browse/CASSANDRA-14983?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16745238#comment-16745238 ]
Marcus Olsson commented on CASSANDRA-14983: ------------------------------------------- I think I have found out why background traffic is required to reproduce this. In SEPExecutor#maybeExecuteImmediately() we try to take a work permit (and no task permit) but we [check|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/concurrent/SEPExecutor.java#L166] if there are task permits available in order to take the work permit. As the fast path does not add the task to the queue (#addTask() which also is the place where the task permits are increased) it might not have any task permits available. By changing the if statement to: {code} if (workPermits == 0 || (takeTaskPermit && taskPermits == 0)) return false; {code} This made the fast path trigger more often. Actually when I added some metrics for this it seemed like the fast path was basically never used for the read path in my setup. With the if statement change I saw it used ~80% of the time with a thread count of 128 in stress. Unfortunately I did not see any performance difference when running tests on this with QUORUM. But it did seem to have a large effect in a scenario with a single node and rf = 1 for 3.0 (graph_local_read.html). While running stress with 32 threads (pre-SEP-1, post-SEP-1) I could see a *~7%* throughput improvement locally. When running stress with 128 threads (pre-SEP-128-1, post-SEP-128-1) the performance dropped slightly. The median latency seems lower but the higher percentiles are taking a hit. For the final two tests (pre-SEP-128-cr128-1, post-SEP-128-cr128-1) I decided to increase the concurrent_read threads from 32 -> 128. The latency results are similar to the previous run but the throughput seems to have increased. Note: I ran _echo 3 > /proc/sys/vm/drop_caches_ before these tests to clear the page cache, etc. which is why there is a large buildup in the beginning. I also made a similar quick test for trunk (graph_local_read_trunk.html) where it seems to be a throughput improvement when using a low thread count. But the overall performance seems to have decreased with a default CCM setup (unless my environment was behaving oddly). I think this could warrant it's own JIRA ticket to investigate more. > Local reads potentially blocking remote reads > --------------------------------------------- > > Key: CASSANDRA-14983 > URL: https://issues.apache.org/jira/browse/CASSANDRA-14983 > Project: Cassandra > Issue Type: Bug > Components: Consistency/Coordination > Reporter: Marcus Olsson > Priority: Minor > Attachments: local_read_trace.log > > > Since CASSANDRA-4718 there is a fast path allowing local requests to continue > to [work in the same > thread|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java#L157] > rather than being sent over to the read stage. > Based on the comment > {code:java} > // We delay the local (potentially blocking) read till the end to avoid > stalling remote requests. > {code} > it seems like this should be performed last in the chain to avoid blocking > remote requests but that does not seem to be the case when the local request > is a data request. The digest request(s) are sent after the data requests are > sent (and now the transient replica requests as well). When the fast path is > used for local data/transient data requests this will block the next type of > request from being sent away until the local read is finished and add > additional latency to the request. > In addition to this it seems like local requests are *always* data requests > (might not be a problem), but the log message can say either ["digest" or > "data"|https://github.com/apache/cassandra/blob/trunk/src/java/org/apache/cassandra/service/reads/AbstractReadExecutor.java#L156] > as the type of request. > I have tried to run performance measurements to see the impact of this in 3.0 > (by moving local requests to the end of ARE#executeAsync()) but I haven't > seen any big difference yet. I'll continue to run some more tests to see if I > can find a use case affected by this. > Attaching a trace (3.0) where this happens. Reproduction: > # Create a three node CCM cluster > # Provision data with stress (rf=3) > # In parallel: > ## Start stress read run > ## Run multiple manual read queries in cqlsh with tracing on and > local_quorum (as this does not always happen) -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@cassandra.apache.org For additional commands, e-mail: commits-h...@cassandra.apache.org