I played around with this a lot when developing the feature. I wrote several micro benchmarks
https://github.com/revans2/storm-micro-perf >From that I played around with the batch size on a 1024 deep queue, which is >what the default is for storm. https://docs.google.com/a/yahoo-inc.com/spreadsheets/d/12FMfzfmjiYjnN_h8rwPe5TEw-1ExoKwujKWCuGzDtmM/pubhtml I ran two different tests both word count, one with a single spout to 4 splitters, and another with 4 spouts to 4 splitters. I ran them on OS X MBP in the spreadsheet, and on a RHEL6 VM (Linux in the spread sheet). These are micro benchmarks and do not necessarily reflect what would happen with storm. These also reflect experiments that are steps towards a final solution, they don't show the final code that is in storm today, which dropped the CPU utilization drastically and increased the throughput even more. I would suggest you take a look at https://github.com/apache/storm/blob/master/examples/storm-starter/src/jvm/org/apache/storm/starter/ThroughputVsLatency.java for a test you can run/modify to give you some other numbers you can experiment with. Looking at the charts I picked a batch size of 100 because it looked like it felt like a good balance between increased throughput and increased latency. In some cases increasing the batch size too much led to lower throughput. I would suggest you play around with a batch size that is about 10% to 25% of the total size of the queue. I have been talking to a few people in academia that are working on auto-tuning of the batch size to optimize throughput for a given SLA, but that is still a ways off. - Bobby On Wednesday, February 10, 2016 8:59 PM, Roshan Naik <[email protected]> wrote: Wondering if there is any rule of thumb guidance around setting topology.disruptor.batch.size, for increasing throughput ? For example any correlation with topology.max.spout.pending, bolt executor count, spout count, etc ? -roshan
