[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Brown updated CASSANDRA-4718:
-----------------------------------

    Attachment: 4718-v1.patch
                v1-stress.out

After several month hiatus, digging into this again. This time around, though, 
I have real hardware to test on, and thus the results are more consistent 
across executions (no cloud provider intermittently throttling me). 

Testing the thrift interface (both sync and hsha), the short story is 
throughput is up ~20% vs. TPE, and 95% / 99%iles are down 40-60%. The 99.9%ile, 
however, is a bit tricker. In some of my tests it is down almost 80%, and 
sometimes it is up 40-50%. I need to dig in further to understand what is going 
on (not sure if it’s because of a shared env, reading across numa cores, and so 
on). perf and likwid are my friends in this investigation.

As to testing the native protocol interface, I’ve only tested writes (new 2.1 
stress seems broken on reads) and I get double the throughput and 40-50% lower 
latencies across the board.  

My test cluster consists of three machines, 32 cores each, 2 sockets (2 numa 
cores), 132G memory, 2.6.39 kernel, plus a similar box that generates the load.

A couple of notes about this patch:
* RequestThreadPoolExecutor now decorates a FJP. Previously we had a TPE which 
contains, of course, a (bounded) queue. The bounded queue helped with back 
pressure from incoming requests. By using a FJP, there is no queue to help with 
back pressure as the FJP always enqueue a task (without blocking). Not sure if 
we still want/need that back pressure here.
* As ForkJoinPool doesn’t expose much in terms of use metrics (like total 
completed) compared to ThreadPoolExecutor, the ForkJoinPoolMetrics is similarly 
barren. Not sure if we want to capture this on our own in DFJP or something 
like else. 
* I have made similar FJP changes to the disruptor-thrift library, and once 
this patch is committed, I’ll work with Pavel to make the changes over there 
and pull in the updated jar.

As a side note, looks like the quasar project 
(http://docs.paralleluniverse.co/quasar/) indicates the jsr166e jar has some 
optimizations (http://blog.paralleluniverse.co/2013/05/02/quasar-pulsar/) over 
the jdk7 implementation (that are included in jdk8). I pulled in those changes 
and stress tested, but didn’t see much of a difference for our use case. I can, 
however, pull them in again if any one feels strongly.


> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>
>                 Key: CASSANDRA-4718
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1
>
>         Attachments: 4718-v1.patch, PerThreadQueue.java, baq vs trunk.png, op 
> costs of various queues.ods, stress op rate with various queues.ods, 
> v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can 
> result in contention between producers and consumers (although we do our best 
> to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more 
> work in "bulk" instead of just one task per dequeue.  (Producer threads tend 
> to be single-task oriented by nature, so I don't see an equivalent 
> opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for 
> this.  However, no ExecutorService in the jdk supports using drainTo, nor 
> could I google one.
> What I would like to do here is create just such a beast and wire it into (at 
> least) the write and read stages.  (Other possible candidates for such an 
> optimization, such as the CommitLog and OutboundTCPConnection, are not 
> ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of 
> ICommitLogExecutorService may also be useful. (Despite the name these are not 
> actual ExecutorServices, although they share the most important properties of 
> one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to