Thanks Bobby.
After STORM-1526 and STORM-1539, the Spout.nextTuple and Bolt.execute are
both running much faster.
The former helped improve overall throughput. But surprisingly but the
latter did not.

The short version of my preliminary assessment (will share details in
another email), after playing with a few settings for the disruptor batch
size and max.spout.pending, is:

The speedup in bolt and spout is placing a much greater pressure on the
Disruptor, leading to a slightly different runtime behavior (which I donĀ¹t
yet understand).. causing disruptor to become the new bottleneck. I
suspect this might skew your prior observations from them micro benchmark.

Will share more info soon in a different thread.

-roshan


On 2/11/16, 7:21 AM, "Bobby Evans" <ev...@yahoo-inc.com.INVALID> wrote:

>I played around with this a lot when developing the feature.  I wrote
>several micro benchmarks
>
>https://github.com/revans2/storm-micro-perf
>
>From that I played around with the batch size on a 1024 deep queue, which
>is what the default is for storm.
>
>https://docs.google.com/a/yahoo-inc.com/spreadsheets/d/12FMfzfmjiYjnN_h8rw
>Pe5TEw-1ExoKwujKWCuGzDtmM/pubhtml
>I ran two different tests both word count, one with a single spout to 4
>splitters, and another with 4 spouts to 4 splitters.  I ran them on OS X
>MBP in the spreadsheet, and on a RHEL6 VM (Linux in the spread sheet).
>These are micro benchmarks and do not necessarily reflect what would
>happen with storm.  These also reflect experiments that are steps towards
>a final solution, they don't show the final code that is in storm today,
>which dropped the CPU utilization drastically and increased the
>throughput even more.
>
>I would suggest you take a look at
>
>https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm
>/org/apache/storm/starter/ThroughputVsLatency.java
>
>for a test you can run/modify to give you some other numbers you can
>experiment with.
>
>Looking at the charts I picked a batch size of 100 because it looked like
>it felt like a good balance between increased throughput and increased
>latency.  In some cases increasing the batch size too much led to lower
>throughput.  I would suggest you play around with a batch size that is
>about 10% to 25% of the total size of the queue.
>I have been talking to a few people in academia that are working on
>auto-tuning of the batch size to optimize throughput for a given SLA, but
>that is still a ways off.
> - Bobby 
>
>    On Wednesday, February 10, 2016 8:59 PM, Roshan Naik
><ros...@hortonworks.com> wrote:
> 
>
> Wondering if there is any rule of thumb guidance around setting
>topology.disruptor.batch.size, for increasing throughput ?
>For example any correlation with topology.max.spout.pending, bolt
>executor count, spout count, etc ?
>-roshan
>
>  

Reply via email to