[jira] [Comment Edited] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

Benedict (JIRA) Wed, 30 Apr 2014 12:47:34 -0700

    [ 
https://issues.apache.org/jira/browse/CASSANDRA-4718?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985978#comment-13985978
 ]


Benedict edited comment on CASSANDRA-4718 at 4/30/14 7:46 PM:
--------------------------------------------------------------

bq. . The Semaphore is blocking (by design)

It's non-blocking until you run out of permits, at which point it must block. 
We have _many more_ shared counters than this semaphore, so I highly doubt it 
will be an issue (if doing nothing but spinning on updating it we could push 
probably several thousand times our current op-rate, and in reality we will be 
doing a lot inbetween, so contention is highly unlikely to be an issue, 
although it will incur a slight QPI penalty - nothing we don't incur all over 
the place though). That said, I have nothing against only conditionally 
creating the Semaphore which would eliminate it as a cost anywhere it isn't 
necessary.

bq.  but any solution is better than forking FJP

It isn't forked - this is all in the same extension class that you 
introduced...?

bq. I literally have no idea what this means.

FJP uses an exclusive lock for enqueueing work onto the pool, but does more 
whilst owning the lock, so is likely to take longer within the critical 
section. The second patch I uploaded attempts to mitigate this for native 
transport threads as those micros are actually a pretty big deal when dealing 
with a flood of tiny messages.

bq.  As we got these data points largely for free from TPE, I guess it made 
sense to expose them, but if we have to go out of our way to fabricate a subset 
of them for FJP, I propose we drop them going forward (for FJP, at least).

I don't really mind, but I think you're overestimating the penalty for 
maintaining these counters.


was (Author: benedict):
bq. . The Semaphore is blocking (by design)

It's non-blocking until you run out of permits, at which point it must block. 
We have _many more_ shared counters than this semaphore, so I highly doubt it 
will be an issue (if doing nothing but spinning on updating it we could push 
probably several thousand times our current op-rate, and in reality we will be 
doing a lot inbetween, so contention is highly unlikely to be an issue, 
although it will incur a slight QPI penalty - nothing we don't incur all over 
the place though).

bq.  but any solution is better than forking FJP

It isn't forked - this is all in the same extension class that you 
introduced...?

bq. I literally have no idea what this means.

FJP uses an exclusive lock for enqueueing work onto the pool, but does more 
whilst owning the lock, so is likely to take longer within the critical 
section. The second patch I uploaded attempts to mitigate this for native 
transport threads as those micros are actually a pretty big deal when dealing 
with a flood of tiny messages.

bq.  As we got these data points largely for free from TPE, I guess it made 
sense to expose them, but if we have to go out of our way to fabricate a subset 
of them for FJP, I propose we drop them going forward (for FJP, at least).

I don't really mind, but I think you're overestimating the penalty for 
maintaining these counters.

> More-efficient ExecutorService for improved throughput
> ------------------------------------------------------
>
>                 Key: CASSANDRA-4718
>                 URL: https://issues.apache.org/jira/browse/CASSANDRA-4718
>             Project: Cassandra
>          Issue Type: Improvement
>            Reporter: Jonathan Ellis
>            Assignee: Jason Brown
>            Priority: Minor
>              Labels: performance
>             Fix For: 2.1.0
>
>         Attachments: 4718-v1.patch, PerThreadQueue.java, baq vs trunk.png, op 
> costs of various queues.ods, stress op rate with various queues.ods, 
> v1-stress.out
>
>
> Currently all our execution stages dequeue tasks one at a time.  This can 
> result in contention between producers and consumers (although we do our best 
> to minimize this by using LinkedBlockingQueue).
> One approach to mitigating this would be to make consumer threads do more 
> work in "bulk" instead of just one task per dequeue.  (Producer threads tend 
> to be single-task oriented by nature, so I don't see an equivalent 
> opportunity there.)
> BlockingQueue has a drainTo(collection, int) method that would be perfect for 
> this.  However, no ExecutorService in the jdk supports using drainTo, nor 
> could I google one.
> What I would like to do here is create just such a beast and wire it into (at 
> least) the write and read stages.  (Other possible candidates for such an 
> optimization, such as the CommitLog and OutboundTCPConnection, are not 
> ExecutorService-based and will need to be one-offs.)
> AbstractExecutorService may be useful.  The implementations of 
> ICommitLogExecutorService may also be useful. (Despite the name these are not 
> actual ExecutorServices, although they share the most important properties of 
> one.)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Comment Edited] (CASSANDRA-4718) More-efficient ExecutorService for improved throughput

Reply via email to