[
https://issues.apache.org/jira/browse/FLINK-8625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16360611#comment-16360611
]
Piotr Nowojski commented on FLINK-8625:
---------------------------------------
I have found one more thing. After fixing the current performance bottlenecks
in https://issues.apache.org/jira/browse/FLINK-8581 , currently GC pressure
caused by OutputFlasher is our biggest performance bottleneck/issue.
OutputFlasher executed once per 1ms for 1000 output channels enqueue every 1ms
1000 elements on a internal Netty's executor. I presume those objects are
pilling up and ending up in old GC generation.
This GC pressure is causing huge throughput fluctuations (because of long GC
pauses) between 20,000 records/ms down to 160 records/ms. Those long GC pauses
are quite dangerous, since they can cause Jobs failure.
> Move OutputFlusher thread to Netty scheduled executor
> -----------------------------------------------------
>
> Key: FLINK-8625
> URL: https://issues.apache.org/jira/browse/FLINK-8625
> Project: Flink
> Issue Type: Sub-task
> Components: Network
> Reporter: Piotr Nowojski
> Priority: Major
>
> This will allow us to trigger/schedule next flush only if we are not
> currently busy.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)