[
https://issues.apache.org/jira/browse/FLINK-4714?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15891545#comment-15891545
]
Wei-Che Wei commented on FLINK-4714:
------------------------------------
Hi [~till.rohrmann]
I have some ideas about this issue and I would like to know if I can get some
feedback from you.
As I know, this issue wants to make the task state be {{RUNNING}} after the
{{StreamTask}} assigns true to {{StreamTask.isRunning}} (i.e. all restored
states and operations have been prepared), so that the checkpoints won't be
aborted.
The following are what I thought that might be possible solutions.
1. Run the other thread monitoring the {{StreamTask.isRunning}}, and change
task state to be {{RUNNING}}. This might be a walk-around solution and I don't
like it, because I think original {{Task}} change the state is more proactive
and this implementation is more like a passive way.
2. Add a prepare() method in {{AbstractInvokable}} and override in
{{StreamTask}} only. Move all prepare work from invoke() to prepare() and call
prepare() before transit state in {{Task}}.
3. As the second implementation and redefine the invoke() method for all class
extends {{AbstractInvokable}} as well. Original invoke() method defines that
all operations and setting, such as I/O stream setting are included in.
The second implementation is a sub-optimal solution for me, because I think
that implementation is more like move the initialization from {{RUNNING}} state
to {{DEPLOYING}} state. Therefore, it is better to redefine the invoke(), not
just customize for {{StreamTask}}.
What do you think?
> Set task state to RUNNING after state has been restored
> -------------------------------------------------------
>
> Key: FLINK-4714
> URL: https://issues.apache.org/jira/browse/FLINK-4714
> Project: Flink
> Issue Type: Improvement
> Components: Distributed Coordination, State Backends, Checkpointing
> Affects Versions: 1.2.0
> Reporter: Till Rohrmann
> Assignee: Wei-Che Wei
>
> The task state is set to {{RUNNING}} as soon as the {{Task}} is executed.
> That, however, happens before the state of the {{StreamTask}} invokable has
> been restored. As a result, the {{CheckpointCoordinator}} starts to trigger
> checkpoints even though the {{StreamTask}} is not ready.
> In order to avoid aborting checkpoints and properly start it, we should
> switch the task state to {{RUNNING}} after the state has been restored.
--
This message was sent by Atlassian JIRA
(v6.3.15#6346)