Thanks for the detailed explanation! Very nice to hear. :) If your flushing writer does not give you enough control for the trade off (in general you cannot know how large records will be, right?) we can have a chat about runtime changes for this. I would be happy to help with it.
In theory it should not be a problem, because we don't rely on fixed buffer sizes and always serialize the buffer size with the network data. The only immediate problem that comes to my mind would be when the sent buffer is too large for the receiving buffers, but I think we could come up with a solution. ;) > On 08 Aug 2014, at 21:15, Gyula Fóra <gyula.f...@gmail.com> wrote: > > Hey guys, > > I might not be able to give you all the details right now, because some of > the data is on my colleague's computer, but I'm gonna try :) > > We have a 30 machine cluster at SZTAKI with 2 cores each not a powerhouse > but good for experimenting. > > We tested both Flink Streaming and Storm with a lot of different settings, > ranging from a couple machines to full cluster utilization. As for the > flink config, we used the default settings, and also the default settings > for Storm. > > We ran two experiments, the first one was a simple streaming wordcount > example, implemented exactly the same way in both systems. We ran the > streaming jobs for about 15-30 minutes each for the experiments but we also > tested longer runs. The other test we ran was a streaming pagerank to test > the performance of the iterations (in storm you can just connect circles in > the topology, so we wanted to see how it compares to the out-of-topology > record passing). On both examples the Flink Streaming was on average about > 3-4 times faster than the same implementation in storm. For storm fault > tolerance was turned of for the sake of the experiment, because that's > still an open issue for us. > > You can check out the implementations here in this repo: > > https://github.com/mbalassi/streaming-performance > (there have been a lot of api changes this week compared to what you find > in this repo) > > One of our colleagues who have implemented these also working on a neat gui > for doing performance tests, its pretty cool :) > > Going back to Ufuk's question regarding what I meant by the role of the > output buffers. So one main difference between storm and flink streaming > could be the way that output buffers are handled. In storm you set the > "output buffer" size for the number of records, which is by default > relatively small for better latency. In flink the output buffer is > typically much larger which gives you higher throughput (that's what we > were measuring). To be able to provide some latency constraint we > implemented a RecordWriter that flushes the output every predefined number > of milliseconds (or when the buffer is full). So this seems to be a better > way of going about latency guarantee, than setting a preset buffer size > with the number of records. > > I think this is similar case as with the comparison of Storm and Spark. Of > course Spark has much higher throughput, but the question is how can we > fine tune the trade-off between latency and throughput, but we will need > more tests to answer these questions :) > > But I think there are more performance improvements to come, we are > planning on doing similar topology optimization as the batch api in the > future. > > > Cheers, > > Gyula > > > > > >> On Fri, Aug 8, 2014 at 4:59 PM, Ufuk Celebi <u.cel...@fu-berlin.de> wrote: >> >> >> On 08 Aug 2014, at 16:07, Kostas Tzoumas <kostas.tzou...@tu-berlin.de> >> wrote: >> >>> Wow! Incredible :-) Can you share more details about the experiments you >>> ran (cluster setup, jobs, etc)? >> >> Same here. :-) >> >> I would be especially interested about what you mean with "partly because >> of the output buffers". >> >> Best wishes, >> >> Ufuk >>