[ https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360611#comment-16360611 ]
Piotr Nowojski commented on FLINK-8625: --------------------------------------- I have found one more thing. After fixing the current performance bottlenecks in https://issues.apache.org/jira/browse/FLINK-8581 , currently GC pressure caused by OutputFlasher is our biggest performance bottleneck/issue. OutputFlasher executed once per 1ms for 1000 output channels enqueue every 1ms 1000 elements on a internal Netty's executor. I presume those objects are pilling up and ending up in old GC generation. This GC pressure is causing huge throughput fluctuations (because of long GC pauses) between 20,000 records/ms down to 160 records/ms. Those long GC pauses are quite dangerous, since they can cause Jobs failure. > Move OutputFlusher thread to Netty scheduled executor > ----------------------------------------------------- > > Key: FLINK-8625 > URL: https://issues.apache.org/jira/browse/FLINK-8625 > Project: Flink > Issue Type: Sub-task > Components: Network > Reporter: Piotr Nowojski > Priority: Major > > This will allow us to trigger/schedule next flush only if we are not > currently busy. -- This message was sent by Atlassian JIRA (v7.6.3#76005)