Hi Bobby,

I never thought about it. But if the community is interested in it, I
would be happy to contribute it. :)

However, I am not super familiar with the actual structure of Storm's
code base and I would need some pointers to integrate in into the system
correctly and nicely.

I claim, to understand the internals of Storm quite well, however, I
have more a user perspective on the system so far.

If I should work on it, it might be a good idea to open a JIRA and
assign it to me, and we can take it from there?


-Matthias



On 05/28/2015 03:20 PM, Bobby Evans wrote:
> Have you thought about contributing this back to storm itself?  From what I 
> have read and a quick pass through the code it looks like from a user 
> perspective you replace one builder with another.  From a code perspective it 
> looks like you replace the fields grouping with one that understands the 
> batching semantics, and wrap the bolts/spouts with batch/unbatch logic.  This 
> feels like something that could easily fit into storm with minor modification 
> and give users more control over latency vs. throughput in their topologies.  
> Making it an official part of storm too, would allow us to update the metrics 
> system to understand the batching and display results on a per tuple basis 
> instead of on a per batch basis.
>  - Bobby
>  
> 
> 
>      On Thursday, May 28, 2015 5:54 AM, Matthias J. Sax 
> <[email protected]> wrote:
>    
> 
>  Hi Manu,
> 
> please find a simple benchmark evaluation on Storm 0.9.3 using the
> following links (it's to much content to attach to this Email).
> 
> https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-0.pdf
> 
> The files shows the result for batch sizes 0 to 4. You can replace the
> last "0" by values up to 16 to get result for higher batch sizes.
> 
> What you can basically observe, it that the maximum achieved data rate
> in the non-batching case is about 250.000 tuple per second (tps) while a
> batch size of about 30 increases it to 2.000.000 tps (with high
> fluctuation; that decreases with even higher batch sizes).
> 
> The benchmark uses a single spout (dop=1) and single bolt (dop=1) and
> measure the output/input rate (in tps) as well as network traffic (in
> KB/s) for different batch sizes.
> 
> The spout emits simple single attribute tuples (type Integer) and is
> configured to emit with a dedicated (stable) output rate. We did
> multiple runs in the benchmark combining different output rates (from
> 200.000 tps to 2.000.000 tps in steps of 200.000) with different batch
> sizes (from 1 to 80).
> 
> Each run used a different configures spout output rate and
> consists of 4 plots showing measures network traffic and output/input
> rate for spout and bolt. The plots might be hard to read (they are
> design for ourself only, and not for publishing). If you have questions
> about them, please let me know.
> 
> We run the experiment in our local cluster. Each node has two Xeon
> E5-2620 2GHz with 6 cores and 24GB main memory. The nodes a connected
> via 1Gbit Ethernet (10Gbit Switch).
> 
> The code and scripts for running the benchmark are on github, too.
> Please refer to the maven module "monitoring". So you should be able to
> run the benchmark on your own hardware.
> 
> -Matthias
> 
> 
> 
> On 05/28/2015 08:44 AM, Manu Zhang wrote:
>> Hi Matthias,
>>
>> The project looks interesting. Any detailed performance data compared with
>> latest storm versions (0.9.3 / 0.9.4) ?
>>
>> Thanks,
>> Manu Zhang
>>
>> On Tue, May 26, 2015 at 11:52 PM, Matthias J. Sax <
>> [email protected]> wrote:
>>
>>> Dear Storm community,
>>>
>>> we would like to share our project Aeolus with you. While the project is
>>> not finished, our first component --- a transparent batching layer ---
>>> is available now.
>>>
>>> Aeolus' batching component, is a transparent layer that can increase
>>> Storm's throughput by an order of magnitude while keeping tuple-by-tuple
>>> processing semantics. Batching happens transparent to the system and the
>>> user code. Thus, it can be used without changing existing code.
>>>
>>> Aeolus is available using Apache License 2.0 and would be happy to any
>>> feedback. If you like to try it out, you can download Aeolus from our
>>> git repository:
>>>         https://github.com/mjsax/aeolus
>>>
>>>
>>> Happy hacking,
>>>   Matthias
>>>
>>>
>>
> 
> 
>   
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to