Re: Benchmarking Streaming Computation Engines at Yahoo!

Matthias J. Sax Fri, 18 Dec 2015 02:34:01 -0800

Hi,

Flink is using byte buffer to transfer data. If a buffer does not fill
up quickly enough, a timeout is applied and the buffer is transfered
before if fills up. This timeout can be configured:


env.setBufferTimeout(timeoutMillis);

see:
https://ci.apache.org/projects/flink/flink-docs-release-0.8/streaming_guide.html#buffer-timeout

So for low throughput, the latency can be decrease by decreasing this
timeout value to avoid the extra waiting time you mentioned.

-Matthias

On 12/18/2015 09:42 AM, 刘键(Basti Liu) wrote:
> Hi Jerry,
> 
> Thanks for the clarification.
> But just for my understanding, the reason why we got the lower latency is the 
> "window" mechanism in Flink. I guess the stream in Flink is flushed as one or 
> several batches 
> for a window. So when lower throughputs, it will lead to the extra waiting at 
> source component. So it is possible to lower the latency of Flink by 
> adjusting configuration.
> Actually, my point here is that if we want to compete with Flink or spark 
> stream for at least once or exactly once (high throughput and low latency), 
> the acking mechanism 
> of storm needs to be improved. Currently, there are too many extras messages 
> for acking mechanism in Storm. Sometimes, the throughput of topology depends 
> on the 
> throughput of acker.
> 
> Regards
> Basti
> 
> -----Original Message-----
> From: Boyang(Jerry) Peng [mailto:[email protected]] 
> Sent: Friday, December 18, 2015 7:08 AM
> To: [email protected]
> Subject: Re: Benchmarking Streaming Computation Engines at Yahoo!
> 
> Hello Satiash,
> One of the experiments we wish to do in the future is to compare flink with 
> checkpointing with Storm with acking. If you look at our results, Storm with 
> acking does have lower latency than Flink without checkpointing at lower 
> throughputs.  The keyword here is lower throughputs. What we were trying to 
> say is that Storm with the optimizations we proposed can be comparable to 
> with Flink without checkpointing at higher throughputs even with acking 
> turned on. Best, Jerry 
> 
> 
>     On Thursday, December 17, 2015 1:27 PM, Satish Duggana 
> <[email protected]> wrote:
>  
> 
>  Hi Jerry,
> Thanks for updating the blog.
> 
> Storm with acking should be compared with similar configuration on Flink
> which may be with checkpointing enabled or some other configuration which
> gives at-least-once guarantee. But the below paragraph gives an impression
> that storm with acking is equivalent of Flink without checkpointing which
> is not right.
> 
> "Without acking, Storm even beat Flink at very high throughput, and we
> expect that with further optimizations like combining bolts, more
> intelligent routing of tuples, and improved acking, Storm with acking
> enabled would compete with Flink at very high throughput too."
> 
> Thanks,
> Satish.
> 
> On Thu, Dec 17, 2015 at 10:47 PM, Boyang(Jerry) Peng <
> [email protected]> wrote:
> 
>> Hello Satish,
>> You are correct, there was a typo.  The sentence should be:
>> Flink uses a mechanism called checkpointing to guarantee processing.
>> Unless checkpointing is used in the Flink job, Flink offers at most once
>> processing similar to Storm with acking turned OFF.  For the Flink
>> benchmark we did not use checkpointing."
>>
>> We have already fixed the typo on the blog.  Thanks!
>> Best,
>> Boyang Jerry Peng
>>
>>
>>    On Thursday, December 17, 2015 4:12 AM, Satish Duggana <
>> [email protected]> wrote:
>>
>>
>>  Hi Bobby etal,
>> Thanks for publishing blog post on “Benchmarking streaming computation
>> engines<
>> http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-at>”.
>> It gives good insights on how different streaming engines perform with the
>> usecase mentioned.
>>
>> “Flink uses a mechanism called checkpointing to guarantee processing.
>> Unless checkpointing is used in the Flink job, Flink offers at most once
>> processing similar to Storm with acking turned on.  For the Flink benchmark
>> we did not use checkpointing."
>>
>> Above snippet in your blog was confusing regarding at-most-once guarantee.
>> My understanding is that Storm gives at-most-once without acking. But
>> at-least-once guarantee requires acking on. So, Storm’s acking should be
>> compared with Flink’s at-least-once guarantee which may be by enabling
>> checkpointing or any other required configuration. Am I missing anything
>> here?
>>
>> Thanks,
>> Satish.
>>
>>
>>
>>
> 
>   
>

signature.asc
Description: OpenPGP digital signature

Re: Benchmarking Streaming Computation Engines at Yahoo!

Reply via email to