[jira] (FLINK-5652) Memory leak in AsyncDataStream

Till Rohrmann (JIRA) Tue, 31 Jan 2017 09:29:41 -0800

    [ 
https://issues.apache.org/jira/browse/FLINK-5652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15847162#comment-15847162
 ]


Till Rohrmann edited comment on FLINK-5652 at 1/31/17 5:28 PM:
---------------------------------------------------------------

The problem is the following: When defining a timeout the {{AsyncWaitOperator}} 
registers a {{TriggerTask}} at the {{ProcessingTimeService}} which will 
complete the {{StreamRecordQueueEntry}} with a {{TimeoutException}}. Since this 
is a strong reference and there is currently no way to unregister timers, such 
a {{TriggerTask}} can prevent a {{StreamRecordQueueEntry}} from being garbage 
collected (until the {{TriggerTask}} has been executed). If we have now a high 
throughput stream, where each element is quickly processed, and a large timeout 
(as it is the case with the example job), then we can amass a large number of 
{{TriggerTask}} which refer to actually completed {{StreamRecordQueueEntries}}.

This problem can be mitigated by using {{WeakReferences}} but the underlying 
problem of a potentially large set of useless timers remains. In order to 
properly solve the problem, I think it would be necessary to unregister timers 
for already completed {{StreamRecordQueueEntries}}. Alternatively, we could 
register a period {{TriggerTask}} which iterates over all 
{{StreamRecordQueueEntries}} and checks which of those have been timed out. 
That way, we would avoid creating for each {{StreamRecordQueueEntry}} a 
dedicated {{TriggerTask}}.


was (Author: till.rohrmann):
The problem is the following: When defining a timeout the {{AsyncWaitOperator}} 
registers a {{TriggerTask}} at the {{ProcessingTimeService}} which will 
complete the {{StreamRecordQueueEntry}} with a {{TimeoutException}}. Since this 
is a strong reference and there is currently no way to unregister timers, such 
a {{TriggerTask}} can prevent a {{StreamRecordQueueEntry}} from being garbage 
collected (until the {{TriggerTask}} has been executed). If we have now a high 
throughput stream with a large timeout (as it is the case with the example 
job), then we can amass a large number of {{TriggerTask}} which refer to 
actually completed {{StreamRecordQueueEntries}}.

This problem can be mitigated by using {{WeakReferences}} but the underlying 
problem of a potentially large set of useless timers remains. In order to 
properly solve the problem, I think it would be necessary to unregister timers 
for already completed {{StreamRecordQueueEntries}}. Alternatively, we could 
register a period {{TriggerTask}} which iterates over all 
{{StreamRecordQueueEntries}} and checks which of those have been timed out. 
That way, we would avoid creating for each {{StreamRecordQueueEntry}} a 
dedicated {{TriggerTask}}.

> Memory leak in AsyncDataStream
> ------------------------------
>
>                 Key: FLINK-5652
>                 URL: https://issues.apache.org/jira/browse/FLINK-5652
>             Project: Flink
>          Issue Type: Bug
>          Components: DataStream API
>    Affects Versions: 1.3.0
>            Reporter: Dmitry Golubets
>
> When async operation timeout is > 0, the number of StreamRecordQueueEntry 
> instances keeps growing.
> It can be easily reproduced with the following code:
> {code}
>   val src: DataStream[Int] = env.fromCollection((1 to Int.MaxValue).iterator)
>   
>   val asyncFunction = new AsyncFunction[Int, Int] with Serializable {
>     override def asyncInvoke(input: Int, collector: AsyncCollector[Int]): 
> Unit = {
>       collector.collect(List(input))
>     }
>   }
>   
>   AsyncDataStream.unorderedWait(src, asyncFunction, 1, TimeUnit.MINUTES, 
> 1).print()
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

[jira] (FLINK-5652) Memory leak in AsyncDataStream

Reply via email to