[
https://issues.apache.org/jira/browse/CASSANDRA-6995?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13963400#comment-13963400
]
Jason Brown commented on CASSANDRA-6995:
----------------------------------------
[~xedin] Ahh, hadn't thought about a new stage for coordinator; thus, there
wouldn't be contention on the read or write stages between both coordinator and
data node.
bq. remote read/write requests I think they should be treated in the same
concurrency quota as thrift/cql requests and they take as such system resource
so scheduling them to the same stages would provide appropriate back-pressure
the client instead of internally overloading the system ….
OK, I can see the argument here for additional back pressure and avoiding
punishing the internal systems - does seem a bit different to the original
intent of this ticket, though :).
bq. [~vijay2win] In a separate note shouldn't we throttle on the number of disk
read from the disk instead of concurrent_writers and reads?
Wow, I like this soo much better than the concurrent_reads yaml property -
which ultimately just sets the size of a thread pool. Using throughput or disk
IO requests per <time_period> or something similar seems a bit more in tune
with what we are trying to do with the machine. But, alas, that might be for a
different ticket.
[~benedict]:
bq. if you know the request will not hit the disk, it should be irrelevant how
many requests are on the read stage;
How do you *know* the request will not hit the disk? I know of only two things
here: using something like mincore to know if the mmap’ed page is, in fact, in
memory, or using something like Datastax’s in-memory option
(http://www.datastax.com/documentation/datastax_enterprise/4.0/datastax_enterprise/inMemory.html).
We don’t have the former, and latter is outside the scope of the OSS project.
bq. if this is only available to (synchronous only?) thrift clients …
It is not thrift-only. It applies to any request that a client routes to an
appropriate node, and uses CL.ONE/LOCAL_ONE.
bq. But I'd like to see evidence it is still beneficial once the change is
added to honour the read stage limit
See Vijay’s comment - i think that is very germane insight. However, lacking
that, yes, respecting the concurrent_reads size is required. However, I think
Pavel's suggestion is better than twisting the existing to use a semaphore.
I think the ideas of Vijay and Pavel are reasonably close in nature, and will
spend some time thinking about those - and how they will or will not affect
this ticket.
> Execute local ONE/LOCAL_ONE reads on request thread instead of dispatching to
> read stage
> ----------------------------------------------------------------------------------------
>
> Key: CASSANDRA-6995
> URL: https://issues.apache.org/jira/browse/CASSANDRA-6995
> Project: Cassandra
> Issue Type: Improvement
> Components: Core
> Reporter: Jason Brown
> Assignee: Jason Brown
> Priority: Minor
> Labels: performance
> Fix For: 2.0.7
>
> Attachments: 6995-v1.diff, syncread-stress.txt
>
>
> When performing a read local to a coordinator node, AbstractReadExecutor will
> create a new SP.LocalReadRunnable and drop it into the read stage for
> asynchronous execution. If you are using a client that intelligently routes
> read requests to a node holding the data for a given request, and are using
> CL.ONE/LOCAL_ONE, the enqueuing SP.LocalReadRunnable and waiting for the
> context switches (and possible NUMA misses) adds unneccesary latency. We can
> reduce that latency and improve throughput by avoiding the queueing and
> thread context switching by simply executing the SP.LocalReadRunnable
> synchronously in the request thread. Testing on a three node cluster (each
> with 32 cpus, 132 GB ram) yields ~10% improvement in throughput and ~20%
> speedup on avg/95/99 percentiles (99.9% was about 5-10% improvement).
--
This message was sent by Atlassian JIRA
(v6.2#6252)