Have you thought about contributing this back to storm itself?  From what I 
have read and a quick pass through the code it looks like from a user 
perspective you replace one builder with another.  From a code perspective it 
looks like you replace the fields grouping with one that understands the 
batching semantics, and wrap the bolts/spouts with batch/unbatch logic.  This 
feels like something that could easily fit into storm with minor modification 
and give users more control over latency vs. throughput in their topologies.  
Making it an official part of storm too, would allow us to update the metrics 
system to understand the batching and display results on a per tuple basis 
instead of on a per batch basis.
 - Bobby
 


     On Thursday, May 28, 2015 5:54 AM, Matthias J. Sax 
<[email protected]> wrote:
   

 Hi Manu,

please find a simple benchmark evaluation on Storm 0.9.3 using the
following links (it's to much content to attach to this Email).

https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-0.pdf

The files shows the result for batch sizes 0 to 4. You can replace the
last "0" by values up to 16 to get result for higher batch sizes.

What you can basically observe, it that the maximum achieved data rate
in the non-batching case is about 250.000 tuple per second (tps) while a
batch size of about 30 increases it to 2.000.000 tps (with high
fluctuation; that decreases with even higher batch sizes).

The benchmark uses a single spout (dop=1) and single bolt (dop=1) and
measure the output/input rate (in tps) as well as network traffic (in
KB/s) for different batch sizes.

The spout emits simple single attribute tuples (type Integer) and is
configured to emit with a dedicated (stable) output rate. We did
multiple runs in the benchmark combining different output rates (from
200.000 tps to 2.000.000 tps in steps of 200.000) with different batch
sizes (from 1 to 80).

Each run used a different configures spout output rate and
consists of 4 plots showing measures network traffic and output/input
rate for spout and bolt. The plots might be hard to read (they are
design for ourself only, and not for publishing). If you have questions
about them, please let me know.

We run the experiment in our local cluster. Each node has two Xeon
E5-2620 2GHz with 6 cores and 24GB main memory. The nodes a connected
via 1Gbit Ethernet (10Gbit Switch).

The code and scripts for running the benchmark are on github, too.
Please refer to the maven module "monitoring". So you should be able to
run the benchmark on your own hardware.

-Matthias



On 05/28/2015 08:44 AM, Manu Zhang wrote:
> Hi Matthias,
> 
> The project looks interesting. Any detailed performance data compared with
> latest storm versions (0.9.3 / 0.9.4) ?
> 
> Thanks,
> Manu Zhang
> 
> On Tue, May 26, 2015 at 11:52 PM, Matthias J. Sax <
> [email protected]> wrote:
> 
>> Dear Storm community,
>>
>> we would like to share our project Aeolus with you. While the project is
>> not finished, our first component --- a transparent batching layer ---
>> is available now.
>>
>> Aeolus' batching component, is a transparent layer that can increase
>> Storm's throughput by an order of magnitude while keeping tuple-by-tuple
>> processing semantics. Batching happens transparent to the system and the
>> user code. Thus, it can be used without changing existing code.
>>
>> Aeolus is available using Apache License 2.0 and would be happy to any
>> feedback. If you like to try it out, you can download Aeolus from our
>> git repository:
>>        https://github.com/mjsax/aeolus
>>
>>
>> Happy hacking,
>>  Matthias
>>
>>
> 


  

Reply via email to