Hi Bobby, I never thought about it. But if the community is interested in it, I would be happy to contribute it. :)
However, I am not super familiar with the actual structure of Storm's code base and I would need some pointers to integrate in into the system correctly and nicely. I claim, to understand the internals of Storm quite well, however, I have more a user perspective on the system so far. If I should work on it, it might be a good idea to open a JIRA and assign it to me, and we can take it from there? -Matthias On 05/28/2015 03:20 PM, Bobby Evans wrote: > Have you thought about contributing this back to storm itself? From what I > have read and a quick pass through the code it looks like from a user > perspective you replace one builder with another. From a code perspective it > looks like you replace the fields grouping with one that understands the > batching semantics, and wrap the bolts/spouts with batch/unbatch logic. This > feels like something that could easily fit into storm with minor modification > and give users more control over latency vs. throughput in their topologies. > Making it an official part of storm too, would allow us to update the metrics > system to understand the batching and display results on a per tuple basis > instead of on a per batch basis. > - Bobby > > > > On Thursday, May 28, 2015 5:54 AM, Matthias J. Sax > <[email protected]> wrote: > > > Hi Manu, > > please find a simple benchmark evaluation on Storm 0.9.3 using the > following links (it's to much content to attach to this Email). > > https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-0.pdf > > The files shows the result for batch sizes 0 to 4. You can replace the > last "0" by values up to 16 to get result for higher batch sizes. > > What you can basically observe, it that the maximum achieved data rate > in the non-batching case is about 250.000 tuple per second (tps) while a > batch size of about 30 increases it to 2.000.000 tps (with high > fluctuation; that decreases with even higher batch sizes). > > The benchmark uses a single spout (dop=1) and single bolt (dop=1) and > measure the output/input rate (in tps) as well as network traffic (in > KB/s) for different batch sizes. > > The spout emits simple single attribute tuples (type Integer) and is > configured to emit with a dedicated (stable) output rate. We did > multiple runs in the benchmark combining different output rates (from > 200.000 tps to 2.000.000 tps in steps of 200.000) with different batch > sizes (from 1 to 80). > > Each run used a different configures spout output rate and > consists of 4 plots showing measures network traffic and output/input > rate for spout and bolt. The plots might be hard to read (they are > design for ourself only, and not for publishing). If you have questions > about them, please let me know. > > We run the experiment in our local cluster. Each node has two Xeon > E5-2620 2GHz with 6 cores and 24GB main memory. The nodes a connected > via 1Gbit Ethernet (10Gbit Switch). > > The code and scripts for running the benchmark are on github, too. > Please refer to the maven module "monitoring". So you should be able to > run the benchmark on your own hardware. > > -Matthias > > > > On 05/28/2015 08:44 AM, Manu Zhang wrote: >> Hi Matthias, >> >> The project looks interesting. Any detailed performance data compared with >> latest storm versions (0.9.3 / 0.9.4) ? >> >> Thanks, >> Manu Zhang >> >> On Tue, May 26, 2015 at 11:52 PM, Matthias J. Sax < >> [email protected]> wrote: >> >>> Dear Storm community, >>> >>> we would like to share our project Aeolus with you. While the project is >>> not finished, our first component --- a transparent batching layer --- >>> is available now. >>> >>> Aeolus' batching component, is a transparent layer that can increase >>> Storm's throughput by an order of magnitude while keeping tuple-by-tuple >>> processing semantics. Batching happens transparent to the system and the >>> user code. Thus, it can be used without changing existing code. >>> >>> Aeolus is available using Apache License 2.0 and would be happy to any >>> feedback. If you like to try it out, you can download Aeolus from our >>> git repository: >>> https://github.com/mjsax/aeolus >>> >>> >>> Happy hacking, >>> Matthias >>> >>> >> > > > >
signature.asc
Description: OpenPGP digital signature
