[jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

Pavel Yaskevich (JIRA) Thu, 15 May 2014 17:04:31 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997754#comment-13997754
 ]


Pavel Yaskevich commented on CASSANDRA-4718:
--------------------------------------------

bq. What about writes, that's a pretty big scenario this helps improve 

The latest Ryan's numbers are from write workload.

bq. Well, except that we expect in general for recent data to be accessed most 
often, or data to be accessed according to a zipf distribution, and in both of 
these cases caching helps to keep a significant portion of the data we're 
accessing in memory. Also, more users are getting incredibly performant SSDs 
that can respond to queries in time horizons measured in microseconds, and as 
this becomes the norm the distinction also becomes less important.

I always thought that Zipf's law is for the scientific data, is it not? SSD 
could be performant but you can't get the full speed yet as close as you can 
get currently is 3.13+ with multiqueue support enabled.

bq. Right, but we've always targetted "total data larger than memory, hot data 
more or less fits." So I absolutely think this ticket is relevant for a lot of 
use cases.

Exactly, "hot data more or less fits" so the problem is that once you get into 
page page reclaim and disk reads (even SSDs), improvements maid here are no 
longer doing anything helpful, I think that would be clearly visible on the 
benchmarks to come.

> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>
>                 Key: CASSANDRA-4718
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Benedict
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>
>         Attachments: 4718-v1.patch, PerThreadQueue.java, aws.svg, 
> aws_read.svg, backpressure-stress.out.txt, baq vs trunk.png, 
> belliotsmith_branches-stress.out.txt, jason_read.svg, jason_read_latency.svg, 
> jason_write.svg, op costs of various queues.ods, stress op rate with various 
> queues.ods, v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can 
> result in contention between producers and consumers (although we do our best 
> to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more 
> work in "bulk" instead of just one task per dequeue.  (Producer threads tend 
> to be single-task oriented by nature, so I don't see an equivalent 
> opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for 
> this.  However, no ExecutorService in the jdk supports using drainTo, nor 
> could I google one.
> What I would like to do here is create just such a beast and wire it into (at 
> least) the write and read stages.  (Other possible candidates for such an 
> optimization, such as the CommitLog and OutboundTCPConnection, are not 
> ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of 
> ICommitLogExecutorService may also be useful. (Despite the name these are not 
> actual ExecutorServices, although they share the most important properties of 
> one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

Reply via email to