[
https://issues.apache.org/jira/browse/FLINK-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847162#comment-15847162
]
Till Rohrmann commented on FLINK-5652:
--------------------------------------
The problem is the following: When defining a timeout the {{AsyncWaitOperator}}
registers a {{TriggerTask}} at the {{ProcessingTimeService}} which will
complete the {{StreamRecordQueueEntry}} with a {{TimeoutException}}. Since this
is a strong reference and there is currently no way to unregister timers, such
a {{TriggerTask}} can prevent a {{StreamRecordQueueEntry}} from being garbage
collected (until the {{TriggerTask}} has been executed). If we have now a high
throughput stream with a large timeout (as it is the case with the example
job), then we can amass a large number of {{TriggerTask}} which refer to
actually completed {{StreamRecordQueueEntries}}.
This problem can be mitigated by using {{WeakReferences}} but the underlying
problem of a potentially large set of useless timers remains. In order to
properly solve the problem, I think it would be necessary to unregister timers
for already completed {{StreamRecordQueueEntries}}. Alternatively, we could
register a period {{TriggerTask}} which iterates over all
{{StreamRecordQueueEntries}} and checks which of those have been timed out.
That way, we would avoid creating for each {{StreamRecordQueueEntry}} a
dedicated {{TriggerTask}}.
> Memory leak in AsyncDataStream
> ------------------------------
>
> Key: FLINK-5652
> URL: https://issues.apache.org/jira/browse/FLINK-5652
> Project: Flink
> Issue Type: Bug
> Components: DataStream API
> Affects Versions: 1.3.0
> Reporter: Dmitry Golubets
>
> When async operation timeout is > 0, the number of StreamRecordQueueEntry
> instances keeps growing.
> It can be easily reproduced with the following code:
> {code}
> val src: DataStream[Int] = env.fromCollection((1 to Int.MaxValue).iterator)
>
> val asyncFunction = new AsyncFunction[Int, Int] with Serializable {
> override def asyncInvoke(input: Int, collector: AsyncCollector[Int]):
> Unit = {
> collector.collect(List(input))
> }
> }
>
> AsyncDataStream.unorderedWait(src, asyncFunction, 1, TimeUnit.MINUTES,
> 1).print()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)