[
https://issues.apache.org/jira/browse/FLINK-14118?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16933200#comment-16933200
]
Stephan Ewen commented on FLINK-14118:
--------------------------------------
Great diagnosis and great fix!
Do we have any data as to whether this fix causes any overhead in other cases,
or is this always strictly better?
> Reduce the unnecessary flushing when there is no data available for flush
> -------------------------------------------------------------------------
>
> Key: FLINK-14118
> URL: https://issues.apache.org/jira/browse/FLINK-14118
> Project: Flink
> Issue Type: Improvement
> Components: Runtime / Network
> Reporter: Yingjie Cao
> Priority: Critical
> Labels: pull-request-available
> Fix For: 1.10.0, 1.9.1, 1.8.3
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> The new flush implementation which works by triggering a netty user event may
> cause performance regression compared to the old synchronization-based one.
> More specifically, when there is exactly one BufferConsumer in the buffer
> queue of subpartition and no new data will be added for a while in the future
> (may because of just no input or the logic of the operator is to collect some
> data for processing and will not emit records immediately), that is, there is
> no data to send, the OutputFlusher will continuously notify data available
> and wake up the netty thread, though no data will be returned by the
> pollBuffer method.
> For some of our production jobs, this will incur 20% to 40% CPU overhead
> compared to the old implementation. We tried to fix the problem by checking
> if there is new data available when flushing, if there is no new data, the
> netty thread will not be notified. It works for our jobs and the cpu usage
> falls to previous level.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)