Thanks for the input.

Currently, everything is written in Java (I am not familiar with Clojure
-- maybe a good way to get started with it though ;)). Nathan just
mentioned that the code could be included into "external" modules. Thus,
it might be the easiest way to put it there. What are those external
module Nathan is referring to?

I am just wondering how deep the integration in the system should be. If
a deeper integration is the better solution, we should follow this path.

You are the experts. What is the better solution?

-Matthias



On 06/03/2015 09:19 PM, Bobby Evans wrote:
> Sorry I didn't respond sooner, thing are rather busy :). You should be able 
> to file  JIRA yourself if you want to, it is open to anyone. Storm has not 
> documented the code base very well.  The core part of storm is in the 
> storm-core sub project.  It has both java and clojure code in it.  The 
> clojure code is where most everything happens.  The daemons are located under 
> storm-core/src/clj/backtype/storm/daemon.  worker.clj and executor.clj are 
> probable the places that you would want to update metrics and routing.  The 
> code that creates the topology is in java.
>  - Bobby
>  
> 
> 
>      On Thursday, May 28, 2015 9:46 AM, Matthias J. Sax 
> <[email protected]> wrote:
>    
> 
>  Hi Bobby,
> 
> I never thought about it. But if the community is interested in it, I
> would be happy to contribute it. :)
> 
> However, I am not super familiar with the actual structure of Storm's
> code base and I would need some pointers to integrate in into the system
> correctly and nicely.
> 
> I claim, to understand the internals of Storm quite well, however, I
> have more a user perspective on the system so far.
> 
> If I should work on it, it might be a good idea to open a JIRA and
> assign it to me, and we can take it from there?
> 
> 
> -Matthias
> 
> 
> 
> On 05/28/2015 03:20 PM, Bobby Evans wrote:
>> Have you thought about contributing this back to storm itself?  From what I 
>> have read and a quick pass through the code it looks like from a user 
>> perspective you replace one builder with another.  From a code perspective 
>> it looks like you replace the fields grouping with one that understands the 
>> batching semantics, and wrap the bolts/spouts with batch/unbatch logic.  
>> This feels like something that could easily fit into storm with minor 
>> modification and give users more control over latency vs. throughput in 
>> their topologies.  Making it an official part of storm too, would allow us 
>> to update the metrics system to understand the batching and display results 
>> on a per tuple basis instead of on a per batch basis.
>>   - Bobby
>>   
>>
>>
>>       On Thursday, May 28, 2015 5:54 AM, Matthias J. Sax 
>> <[email protected]> wrote:
>>     
>>
>>   Hi Manu,
>>
>> please find a simple benchmark evaluation on Storm 0.9.3 using the
>> following links (it's to much content to attach to this Email).
>>
>> https://www2.informatik.hu-berlin.de/~saxmatti/storm-aeolus-benchmark/batchingBenchmark-spout-batching-0.pdf
>>
>> The files shows the result for batch sizes 0 to 4. You can replace the
>> last "0" by values up to 16 to get result for higher batch sizes.
>>
>> What you can basically observe, it that the maximum achieved data rate
>> in the non-batching case is about 250.000 tuple per second (tps) while a
>> batch size of about 30 increases it to 2.000.000 tps (with high
>> fluctuation; that decreases with even higher batch sizes).
>>
>> The benchmark uses a single spout (dop=1) and single bolt (dop=1) and
>> measure the output/input rate (in tps) as well as network traffic (in
>> KB/s) for different batch sizes.
>>
>> The spout emits simple single attribute tuples (type Integer) and is
>> configured to emit with a dedicated (stable) output rate. We did
>> multiple runs in the benchmark combining different output rates (from
>> 200.000 tps to 2.000.000 tps in steps of 200.000) with different batch
>> sizes (from 1 to 80).
>>
>> Each run used a different configures spout output rate and
>> consists of 4 plots showing measures network traffic and output/input
>> rate for spout and bolt. The plots might be hard to read (they are
>> design for ourself only, and not for publishing). If you have questions
>> about them, please let me know.
>>
>> We run the experiment in our local cluster. Each node has two Xeon
>> E5-2620 2GHz with 6 cores and 24GB main memory. The nodes a connected
>> via 1Gbit Ethernet (10Gbit Switch).
>>
>> The code and scripts for running the benchmark are on github, too.
>> Please refer to the maven module "monitoring". So you should be able to
>> run the benchmark on your own hardware.
>>
>> -Matthias
>>
>>
>>
>> On 05/28/2015 08:44 AM, Manu Zhang wrote:
>>> Hi Matthias,
>>>
>>> The project looks interesting. Any detailed performance data compared with
>>> latest storm versions (0.9.3 / 0.9.4) ?
>>>
>>> Thanks,
>>> Manu Zhang
>>>
>>> On Tue, May 26, 2015 at 11:52 PM, Matthias J. Sax <
>>> [email protected]> wrote:
>>>
>>>> Dear Storm community,
>>>>
>>>> we would like to share our project Aeolus with you. While the project is
>>>> not finished, our first component --- a transparent batching layer ---
>>>> is available now.
>>>>
>>>> Aeolus' batching component, is a transparent layer that can increase
>>>> Storm's throughput by an order of magnitude while keeping tuple-by-tuple
>>>> processing semantics. Batching happens transparent to the system and the
>>>> user code. Thus, it can be used without changing existing code.
>>>>
>>>> Aeolus is available using Apache License 2.0 and would be happy to any
>>>> feedback. If you like to try it out, you can download Aeolus from our
>>>> git repository:
>>>>         https://github.com/mjsax/aeolus
>>>>
>>>>
>>>> Happy hacking,
>>>>   Matthias
>>>>
>>>>
>>>
>>
>>
>>   
>>
> 
> 
>   
> 

Attachment: signature.asc
Description: OpenPGP digital signature

Reply via email to