Sorry I didn't respond sooner, thing are rather busy :). You should be able to 
file  JIRA yourself if you want to, it is open to anyone. Storm has not 
documented the code base very well.  The core part of storm is in the 
storm-core sub project.  It has both java and clojure code in it.  The clojure 
code is where most everything happens.  The daemons are located under 
storm-core/src/clj/backtype/storm/daemon.  worker.clj and executor.clj are 
probable the places that you would want to update metrics and routing.  The 
code that creates the topology is in java.
 - Bobby
 


     On Thursday, May 28, 2015 9:46 AM, Matthias J. Sax 
<[email protected]> wrote:
   

 Hi Bobby,

I never thought about it. But if the community is interested in it, I
would be happy to contribute it. :)

However, I am not super familiar with the actual structure of Storm's
code base and I would need some pointers to integrate in into the system
correctly and nicely.

I claim, to understand the internals of Storm quite well, however, I
have more a user perspective on the system so far.

If I should work on it, it might be a good idea to open a JIRA and
assign it to me, and we can take it from there?


-Matthias



On 05/28/2015 03:20 PM, Bobby Evans wrote:
> Have you thought about contributing this back to storm itself?  From what I 
> have read and a quick pass through the code it looks like from a user 
> perspective you replace one builder with another.  From a code perspective it 
> looks like you replace the fields grouping with one that understands the 
> batching semantics, and wrap the bolts/spouts with batch/unbatch logic.  This 
> feels like something that could easily fit into storm with minor modification 
> and give users more control over latency vs. throughput in their topologies.  
> Making it an official part of storm too, would allow us to update the metrics 
> system to understand the batching and display results on a per tuple basis 
> instead of on a per batch basis.
>  - Bobby
>  
> 
> 
>      On Thursday, May 28, 2015 5:54 AM, Matthias J. Sax 
><[email protected]> wrote:
>    
> 
>  Hi Manu,
> 
> please find a simple benchmark evaluation on Storm 0.9.3 using the
> following links (it's to much content to attach to this Email).
> 
> https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-0.pdf
> 
> The files shows the result for batch sizes 0 to 4. You can replace the
> last "0" by values up to 16 to get result for higher batch sizes.
> 
> What you can basically observe, it that the maximum achieved data rate
> in the non-batching case is about 250.000 tuple per second (tps) while a
> batch size of about 30 increases it to 2.000.000 tps (with high
> fluctuation; that decreases with even higher batch sizes).
> 
> The benchmark uses a single spout (dop=1) and single bolt (dop=1) and
> measure the output/input rate (in tps) as well as network traffic (in
> KB/s) for different batch sizes.
> 
> The spout emits simple single attribute tuples (type Integer) and is
> configured to emit with a dedicated (stable) output rate. We did
> multiple runs in the benchmark combining different output rates (from
> 200.000 tps to 2.000.000 tps in steps of 200.000) with different batch
> sizes (from 1 to 80).
> 
> Each run used a different configures spout output rate and
> consists of 4 plots showing measures network traffic and output/input
> rate for spout and bolt. The plots might be hard to read (they are
> design for ourself only, and not for publishing). If you have questions
> about them, please let me know.
> 
> We run the experiment in our local cluster. Each node has two Xeon
> E5-2620 2GHz with 6 cores and 24GB main memory. The nodes a connected
> via 1Gbit Ethernet (10Gbit Switch).
> 
> The code and scripts for running the benchmark are on github, too.
> Please refer to the maven module "monitoring". So you should be able to
> run the benchmark on your own hardware.
> 
> -Matthias
> 
> 
> 
> On 05/28/2015 08:44 AM, Manu Zhang wrote:
>> Hi Matthias,
>>
>> The project looks interesting. Any detailed performance data compared with
>> latest storm versions (0.9.3 / 0.9.4) ?
>>
>> Thanks,
>> Manu Zhang
>>
>> On Tue, May 26, 2015 at 11:52 PM, Matthias J. Sax <
>> [email protected]> wrote:
>>
>>> Dear Storm community,
>>>
>>> we would like to share our project Aeolus with you. While the project is
>>> not finished, our first component --- a transparent batching layer ---
>>> is available now.
>>>
>>> Aeolus' batching component, is a transparent layer that can increase
>>> Storm's throughput by an order of magnitude while keeping tuple-by-tuple
>>> processing semantics. Batching happens transparent to the system and the
>>> user code. Thus, it can be used without changing existing code.
>>>
>>> Aeolus is available using Apache License 2.0 and would be happy to any
>>> feedback. If you like to try it out, you can download Aeolus from our
>>> git repository:
>>>        https://github.com/mjsax/aeolus
>>>
>>>
>>> Happy hacking,
>>>  Matthias
>>>
>>>
>>
> 
> 
>  
> 


  

Reply via email to