[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benedict updated CASSANDRA-4718:
--------------------------------

    Attachment: jason_write.svg
                jason_read_latency.svg
                jason_read.svg
                aws.svg

I've attached read throughput graphs for a run on a 2 node AWS c3.8xlarge with 
1 load generating node (i've ignored write throughputs for now as the local 
disks are atrocious, so the numbers are pretty meaningless and all over the 
place), and graphs for Jason's runs read/write runs, including the read 
latencies (most representative of networking as they should all in-memory 
operations)

It looks like on modern/patched kernels and/or hyperthreaded modern CPUs the 
lse-batchnetty patch wins hands down - latency and throughput are all 
substantially better. I'll now try to see if a little bit of tuning can yield 
better results for all systems, as the prior version of the LSE branch was 
hitting the same throughput as 2.1-batchnetty does on our internal 
no-hyperthreading 2.6 clusters. So if we can bring that back up the case should 
be pretty compelling.

> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>
>                 Key: CASSANDRA-4718
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>
>         Attachments: 4718-v1.patch, PerThreadQueue.java, aws.svg, 
> backpressure-stress.out.txt, baq vs trunk.png, 
> belliotsmith_branches-stress.out.txt, jason_read.svg, jason_read_latency.svg, 
> jason_write.svg, op costs of various queues.ods, stress op rate with various 
> queues.ods, v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can 
> result in contention between producers and consumers (although we do our best 
> to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more 
> work in "bulk" instead of just one task per dequeue.  (Producer threads tend 
> to be single-task oriented by nature, so I don't see an equivalent 
> opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for 
> this.  However, no ExecutorService in the jdk supports using drainTo, nor 
> could I google one.
> What I would like to do here is create just such a beast and wire it into (at 
> least) the write and read stages.  (Other possible candidates for such an 
> optimization, such as the CommitLog and OutboundTCPConnection, are not 
> ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of 
> ICommitLogExecutorService may also be useful. (Despite the name these are not 
> actual ExecutorServices, although they share the most important properties of 
> one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to