[
https://issues.apache.org/jira/browse/FLINK-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847162#comment-15847162
]
Till Rohrmann edited comment on FLINK-5652 at 1/31/17 5:28 PM:
---------------------------------------------------------------
The problem is the following: When defining a timeout the {{AsyncWaitOperator}}
registers a {{TriggerTask}} at the {{ProcessingTimeService}} which will
complete the {{StreamRecordQueueEntry}} with a {{TimeoutException}}. Since this
is a strong reference and there is currently no way to unregister timers, such
a {{TriggerTask}} can prevent a {{StreamRecordQueueEntry}} from being garbage
collected (until the {{TriggerTask}} has been executed). If we have now a high
throughput stream, where each element is quickly processed, and a large timeout
(as it is the case with the example job), then we can amass a large number of
{{TriggerTask}} which refer to actually completed {{StreamRecordQueueEntries}}.
This problem can be mitigated by using {{WeakReferences}} but the underlying
problem of a potentially large set of useless timers remains. In order to
properly solve the problem, I think it would be necessary to unregister timers
for already completed {{StreamRecordQueueEntries}}. Alternatively, we could
register a period {{TriggerTask}} which iterates over all
{{StreamRecordQueueEntries}} and checks which of those have been timed out.
That way, we would avoid creating for each {{StreamRecordQueueEntry}} a
dedicated {{TriggerTask}}.
was (Author: till.rohrmann):
The problem is the following: When defining a timeout the {{AsyncWaitOperator}}
registers a {{TriggerTask}} at the {{ProcessingTimeService}} which will
complete the {{StreamRecordQueueEntry}} with a {{TimeoutException}}. Since this
is a strong reference and there is currently no way to unregister timers, such
a {{TriggerTask}} can prevent a {{StreamRecordQueueEntry}} from being garbage
collected (until the {{TriggerTask}} has been executed). If we have now a high
throughput stream with a large timeout (as it is the case with the example
job), then we can amass a large number of {{TriggerTask}} which refer to
actually completed {{StreamRecordQueueEntries}}.
This problem can be mitigated by using {{WeakReferences}} but the underlying
problem of a potentially large set of useless timers remains. In order to
properly solve the problem, I think it would be necessary to unregister timers
for already completed {{StreamRecordQueueEntries}}. Alternatively, we could
register a period {{TriggerTask}} which iterates over all
{{StreamRecordQueueEntries}} and checks which of those have been timed out.
That way, we would avoid creating for each {{StreamRecordQueueEntry}} a
dedicated {{TriggerTask}}.
> Memory leak in AsyncDataStream
> ------------------------------
>
> Key: FLINK-5652
> URL: https://issues.apache.org/jira/browse/FLINK-5652
> Project: Flink
> Issue Type: Bug
> Components: DataStream API
> Affects Versions: 1.3.0
> Reporter: Dmitry Golubets
>
> When async operation timeout is > 0, the number of StreamRecordQueueEntry
> instances keeps growing.
> It can be easily reproduced with the following code:
> {code}
> val src: DataStream[Int] = env.fromCollection((1 to Int.MaxValue).iterator)
>
> val asyncFunction = new AsyncFunction[Int, Int] with Serializable {
> override def asyncInvoke(input: Int, collector: AsyncCollector[Int]):
> Unit = {
> collector.collect(List(input))
> }
> }
>
> AsyncDataStream.unorderedWait(src, asyncFunction, 1, TimeUnit.MINUTES,
> 1).print()
> {code}
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)