[ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000724#comment-14000724
 ] 

Benedict commented on CASSANDRA-4718:
-------------------------------------

[~xedin] why are you only counting the primary replica data? Requests will hit 
both replicas by default? If you look at the results there is a reasonable 
amount of variability for both runs, so it's not clear that one is slower or 
faster - there are a number of points where 4718-sep is faster than 2.1, and 
vice versa, and given it is disk bound I am inclined to suggest this is not the 
patch making it perform worse. In fact, a majority of data points show higher 
throughput for 4718-sep, not for 2.1. Your first test, every thread count below 
271 is faster; 271 seems to be a blip due to a small number of very slow reads 
affecting the very last measurement (there's a "race" in stress' auto mode 
where some measurements are still accepted after it's decided enough have been 
taken, as can be seen by the final stderr being above the acceptability point); 
2.1 showed a similar effect at this tc, but smaller, so this seems likely to be 
random chance. The last test it is faster for all thread counts despite some 
weird max latencies. It's only the middle test where it appears to be 
marginally slower, and given this test performs effectively exactly the same 
amount of work as the first test, I'm not sure this demonstrates a great deal 
other than the variability.

It's also worth asking what your max read concurrency is? As I'm surprised to 
see thread counts > 180 causing dramatic spikes in latency (both branches) when 
I'd expect them to be saturating the read stage well before then?



> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>
>                 Key: CASSANDRA-4718
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>
>         Attachments: 4718-v1.patch, PerThreadQueue.java, 
> austin_diskbound_read.svg, aws.svg, aws_read.svg, 
> backpressure-stress.out.txt, baq vs trunk.png, 
> belliotsmith_branches-stress.out.txt, jason_read.svg, jason_read_latency.svg, 
> jason_write.svg, op costs of various queues.ods, stress op rate with various 
> queues.ods, stress_2014May15.txt, stress_2014May16.txt, v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can 
> result in contention between producers and consumers (although we do our best 
> to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more 
> work in "bulk" instead of just one task per dequeue.  (Producer threads tend 
> to be single-task oriented by nature, so I don't see an equivalent 
> opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for 
> this.  However, no ExecutorService in the jdk supports using drainTo, nor 
> could I google one.
> What I would like to do here is create just such a beast and wire it into (at 
> least) the write and read stages.  (Other possible candidates for such an 
> optimization, such as the CommitLog and OutboundTCPConnection, are not 
> ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of 
> ICommitLogExecutorService may also be useful. (Despite the name these are not 
> actual ExecutorServices, although they share the most important properties of 
> one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to