If you've built it on top of storm then "external" modules may be a good location.
On Thu, May 28, 2015 at 10:42 AM, Matthias J. Sax < [email protected]> wrote: > Hi Bobby, > > I never thought about it. But if the community is interested in it, I > would be happy to contribute it. :) > > However, I am not super familiar with the actual structure of Storm's > code base and I would need some pointers to integrate in into the system > correctly and nicely. > > I claim, to understand the internals of Storm quite well, however, I > have more a user perspective on the system so far. > > If I should work on it, it might be a good idea to open a JIRA and > assign it to me, and we can take it from there? > > > -Matthias > > > > On 05/28/2015 03:20 PM, Bobby Evans wrote: > > Have you thought about contributing this back to storm itself? From > what I have read and a quick pass through the code it looks like from a > user perspective you replace one builder with another. From a code > perspective it looks like you replace the fields grouping with one that > understands the batching semantics, and wrap the bolts/spouts with > batch/unbatch logic. This feels like something that could easily fit into > storm with minor modification and give users more control over latency vs. > throughput in their topologies. Making it an official part of storm too, > would allow us to update the metrics system to understand the batching and > display results on a per tuple basis instead of on a per batch basis. > > - Bobby > > > > > > > > On Thursday, May 28, 2015 5:54 AM, Matthias J. Sax < > [email protected]> wrote: > > > > > > Hi Manu, > > > > please find a simple benchmark evaluation on Storm 0.9.3 using the > > following links (it's to much content to attach to this Email). > > > > > https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-0.pdf > > > > The files shows the result for batch sizes 0 to 4. You can replace the > > last "0" by values up to 16 to get result for higher batch sizes. > > > > What you can basically observe, it that the maximum achieved data rate > > in the non-batching case is about 250.000 tuple per second (tps) while a > > batch size of about 30 increases it to 2.000.000 tps (with high > > fluctuation; that decreases with even higher batch sizes). > > > > The benchmark uses a single spout (dop=1) and single bolt (dop=1) and > > measure the output/input rate (in tps) as well as network traffic (in > > KB/s) for different batch sizes. > > > > The spout emits simple single attribute tuples (type Integer) and is > > configured to emit with a dedicated (stable) output rate. We did > > multiple runs in the benchmark combining different output rates (from > > 200.000 tps to 2.000.000 tps in steps of 200.000) with different batch > > sizes (from 1 to 80). > > > > Each run used a different configures spout output rate and > > consists of 4 plots showing measures network traffic and output/input > > rate for spout and bolt. The plots might be hard to read (they are > > design for ourself only, and not for publishing). If you have questions > > about them, please let me know. > > > > We run the experiment in our local cluster. Each node has two Xeon > > E5-2620 2GHz with 6 cores and 24GB main memory. The nodes a connected > > via 1Gbit Ethernet (10Gbit Switch). > > > > The code and scripts for running the benchmark are on github, too. > > Please refer to the maven module "monitoring". So you should be able to > > run the benchmark on your own hardware. > > > > -Matthias > > > > > > > > On 05/28/2015 08:44 AM, Manu Zhang wrote: > >> Hi Matthias, > >> > >> The project looks interesting. Any detailed performance data compared > with > >> latest storm versions (0.9.3 / 0.9.4) ? > >> > >> Thanks, > >> Manu Zhang > >> > >> On Tue, May 26, 2015 at 11:52 PM, Matthias J. Sax < > >> [email protected]> wrote: > >> > >>> Dear Storm community, > >>> > >>> we would like to share our project Aeolus with you. While the project > is > >>> not finished, our first component --- a transparent batching layer --- > >>> is available now. > >>> > >>> Aeolus' batching component, is a transparent layer that can increase > >>> Storm's throughput by an order of magnitude while keeping > tuple-by-tuple > >>> processing semantics. Batching happens transparent to the system and > the > >>> user code. Thus, it can be used without changing existing code. > >>> > >>> Aeolus is available using Apache License 2.0 and would be happy to any > >>> feedback. If you like to try it out, you can download Aeolus from our > >>> git repository: > >>> https://github.com/mjsax/aeolus > >>> > >>> > >>> Happy hacking, > >>> Matthias > >>> > >>> > >> > > > > > > > > > >
