Hey guys, I might not be able to give you all the details right now, because some of the data is on my colleague's computer, but I'm gonna try :)
We have a 30 machine cluster at SZTAKI with 2 cores each not a powerhouse but good for experimenting. We tested both Flink Streaming and Storm with a lot of different settings, ranging from a couple machines to full cluster utilization. As for the flink config, we used the default settings, and also the default settings for Storm. We ran two experiments, the first one was a simple streaming wordcount example, implemented exactly the same way in both systems. We ran the streaming jobs for about 15-30 minutes each for the experiments but we also tested longer runs. The other test we ran was a streaming pagerank to test the performance of the iterations (in storm you can just connect circles in the topology, so we wanted to see how it compares to the out-of-topology record passing). On both examples the Flink Streaming was on average about 3-4 times faster than the same implementation in storm. For storm fault tolerance was turned of for the sake of the experiment, because that's still an open issue for us. You can check out the implementations here in this repo: https://github.com/mbalassi/streaming-performance (there have been a lot of api changes this week compared to what you find in this repo) One of our colleagues who have implemented these also working on a neat gui for doing performance tests, its pretty cool :) Going back to Ufuk's question regarding what I meant by the role of the output buffers. So one main difference between storm and flink streaming could be the way that output buffers are handled. In storm you set the "output buffer" size for the number of records, which is by default relatively small for better latency. In flink the output buffer is typically much larger which gives you higher throughput (that's what we were measuring). To be able to provide some latency constraint we implemented a RecordWriter that flushes the output every predefined number of milliseconds (or when the buffer is full). So this seems to be a better way of going about latency guarantee, than setting a preset buffer size with the number of records. I think this is similar case as with the comparison of Storm and Spark. Of course Spark has much higher throughput, but the question is how can we fine tune the trade-off between latency and throughput, but we will need more tests to answer these questions :) But I think there are more performance improvements to come, we are planning on doing similar topology optimization as the batch api in the future. Cheers, Gyula On Fri, Aug 8, 2014 at 4:59 PM, Ufuk Celebi <u.cel...@fu-berlin.de> wrote: > > On 08 Aug 2014, at 16:07, Kostas Tzoumas <kostas.tzou...@tu-berlin.de> > wrote: > > > Wow! Incredible :-) Can you share more details about the experiments you > > ran (cluster setup, jobs, etc)? > > Same here. :-) > > I would be especially interested about what you mean with "partly because > of the output buffers". > > Best wishes, > > Ufuk >