[ 
https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360611#comment-16360611
 ] 

Piotr Nowojski commented on FLINK-8625:
---------------------------------------

I have found one more thing. After fixing the current performance bottlenecks 
in https://issues.apache.org/jira/browse/FLINK-8581 , currently GC pressure 
caused by OutputFlasher is our biggest performance bottleneck/issue. 
OutputFlasher executed once per 1ms for 1000 output channels enqueue every 1ms 
1000 elements on a internal Netty's executor. I presume those objects are 
pilling up and ending up in old GC generation.

This GC pressure is causing huge throughput fluctuations (because of long GC 
pauses) between 20,000 records/ms down to 160 records/ms. Those long GC pauses 
are quite dangerous, since they can cause Jobs failure.

> Move OutputFlusher thread to Netty scheduled executor
> -----------------------------------------------------
>
>                 Key: FLINK-8625
>                 URL: https://issues.apache.org/jira/browse/FLINK-8625
>             Project: Flink
>          Issue Type: Sub-task
>          Components: Network
>            Reporter: Piotr Nowojski
>            Priority: Major
>
> This will allow us to trigger/schedule next flush only if we are not 
> currently busy. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to