[
https://issues.apache.org/jira/browse/FLINK-5762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15859874#comment-15859874
]
ASF GitHub Bot commented on FLINK-5762:
---------------------------------------
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3291
I think this works for now.
What would be great is in the long run to actually start the timer service
only in/after open() so that timers that are registered cannot fire.
That way we can make all the state initialization more eager and avoid also
all the misleading null errors logged when cancellation happens too early.
> Protect initializeState() and open() by the same lock.
> ------------------------------------------------------
>
> Key: FLINK-5762
> URL: https://issues.apache.org/jira/browse/FLINK-5762
> Project: Flink
> Issue Type: Bug
> Components: DataStream API
> Affects Versions: 1.3.0
> Reporter: Kostas Kloudas
> Assignee: Kostas Kloudas
> Fix For: 1.3.0
>
>
> Currently the initializeState() of all operators in a task is called without
> the checkpoint lock, and before the open(). This may lead to problematic
> situations as the following:
> In the case that we retrieve timers from a checkpoint, e.g. WindowOperator
> and (future) CEP, if we re-register them in the initializeState(), then if
> they fire before the open() of the downstream operators is called, we will
> have a task failure, as the downstream channels are not open.
> To avoid this, we can put the initializeState() in the same lock as the
> open(), and the two operations will happen while being protected by the same
> lock, which also keeps timers from firing.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)