[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993059#comment-13993059
 ] 

Benedict commented on CASSANDRA-4718:
-------------------------------------

I have a few branches to test out, and I want to test them out an a variety of 
hardware. [~enigmacurry] can you run them on our internal multi-cpu boxes, and 
an AWS c3.8xlarge 4node cluster to the following spec:

For each branch run: 20M inserts over 1M unique keys with 30, 90, 270 and 810 
threads, then wipe each cluster and perform a single 1M key insert, and then 
run 20M reads over 1M unique keys with the same thread counts. All told that 
should take around 3hrs for -mode cql3 native prepared; I'd then like to repeat 
the tests for -mode thrift smart.

The branches are: 
[https://github.com/belliottsmith/cassandra/tree/4718-lse]
[https://github.com/belliottsmith/cassandra/tree/4718-lse-batchnetty]
[https://github.com/belliottsmith/cassandra/tree/4718-fjp]
[https://github.com/belliottsmith/cassandra/tree/4718-lowsignal]
[https://github.com/belliottsmith/cassandra/tree/cassandra-2.1]

Make sure you use my cassandra-2.1 so we're testing like-to-like (they're all 
rebased to the same version).

I'll elaborate on the contents of these branches later, but suffice it to say 
the 4718-lse branch contains a new executor which attempts to reduce signalling 
costs to near zero by scheduling the correct number of threads to deal with the 
level of throughput the executor has been dealing with over the previous 
(short) adjustment window. -batchnetty includes some simple batching of netty 
messages. 4718-lowsignal is an enhanced version of the patch I uploaded 
previously to this ticket, and 4718-fjp is largely unchanged.

On my own box, and on our austin test cluster, I see -lse faster than both -fjp 
and -lowsignal, however on our austin cluster (which is a not super-modern 
4-cpu no-hyperthreading setup) I see both of them slower than stock 2.1, 
however -lse is only slightly slower, whereas -fjp is around 30% slower. I'll 
post polished numbers a little later.

> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>
>                 Key: CASSANDRA-4718
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>
>         Attachments: 4718-v1.patch, PerThreadQueue.java, 
> backpressure-stress.out.txt, baq vs trunk.png, op costs of various 
> queues.ods, stress op rate with various queues.ods, v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can 
> result in contention between producers and consumers (although we do our best 
> to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more 
> work in "bulk" instead of just one task per dequeue.  (Producer threads tend 
> to be single-task oriented by nature, so I don't see an equivalent 
> opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for 
> this.  However, no ExecutorService in the jdk supports using drainTo, nor 
> could I google one.
> What I would like to do here is create just such a beast and wire it into (at 
> least) the write and read stages.  (Other possible candidates for such an 
> optimization, such as the CommitLog and OutboundTCPConnection, are not 
> ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of 
> ICommitLogExecutorService may also be useful. (Despite the name these are not 
> actual ExecutorServices, although they share the most important properties of 
> one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to