I played around with this a lot when developing the feature.  I wrote several 
micro benchmarks 

https://github.com/revans2/storm-micro-perf

>From that I played around with the batch size on a 1024 deep queue, which is 
>what the default is for storm.  

https://docs.google.com/a/yahoo-inc.com/spreadsheets/d/12FMfzfmjiYjnN_h8rwPe5TEw-1ExoKwujKWCuGzDtmM/pubhtml
I ran two different tests both word count, one with a single spout to 4 
splitters, and another with 4 spouts to 4 splitters.  I ran them on OS X MBP in 
the spreadsheet, and on a RHEL6 VM (Linux in the spread sheet).  These are 
micro benchmarks and do not necessarily reflect what would happen with storm.  
These also reflect experiments that are steps towards a final solution, they 
don't show the final code that is in storm today, which dropped the CPU 
utilization drastically and increased the throughput even more.

I would suggest you take a look at 

https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/ThroughputVsLatency.java

for a test you can run/modify to give you some other numbers you can experiment 
with.

Looking at the charts I picked a batch size of 100 because it looked like it 
felt like a good balance between increased throughput and increased latency.  
In some cases increasing the batch size too much led to lower throughput.  I 
would suggest you play around with a batch size that is about 10% to 25% of the 
total size of the queue.
I have been talking to a few people in academia that are working on auto-tuning 
of the batch size to optimize throughput for a given SLA, but that is still a 
ways off.
 - Bobby 

    On Wednesday, February 10, 2016 8:59 PM, Roshan Naik 
<[email protected]> wrote:
 

 Wondering if there is any rule of thumb guidance around setting 
topology.disruptor.batch.size, for increasing throughput ?
For example any correlation with topology.max.spout.pending, bolt executor 
count, spout count, etc ?
-roshan

  

Reply via email to