[
https://issues.apache.org/jira/browse/FLINK-4912?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15838452#comment-15838452
]
ASF GitHub Bot commented on FLINK-4912:
---------------------------------------
Github user StephanEwen commented on the issue:
https://github.com/apache/flink/pull/3113
Considering the possible state transitions:
## ExecutionState
- `RECONCILING` can only be entered from `CREATED`
Simple:
- `RECONCILING` can go to `RUNNING` if the task was reconciled
- `RECONCILING` can go to `FAILED` if the task was not reconciled
Complex:
- For `RECONCILING` to go to `FINISHED`, `CANCELED`, it would mean that
the TaskManager that has the task would report (when registering at the
JobManager) a task that is no longer executing. To do that, the TaskManager
would need to "remember" tasks that completed and where it did not get an
acknowledgement from the JobManager for the execution state update. Is that
anticipated?
## JobStatus
- `RECONCILING` can only be entered from `CREATED`
Simple:
- `RECONCILING` can go to `RUNNING` - if all TaskManagers report their
status and tasks as running
- `RECONCILING` can go to `FAILING` - if not all tasks were reported.
Complex:
- For reconciling to go to into `FINISHED`, we'd need that the
`ExecutionState` can go to `FINISHED`.
What do you think about only doing the "simple" option in the first version?
> Introduce RECONCILING state in ExecutionGraph
> ---------------------------------------------
>
> Key: FLINK-4912
> URL: https://issues.apache.org/jira/browse/FLINK-4912
> Project: Flink
> Issue Type: Sub-task
> Components: Distributed Coordination
> Reporter: Stephan Ewen
> Assignee: Zhijiang Wang
>
> This is part of the non-disruptive JobManager failure recovery.
> I suggest to add a JobStatus and ExecutionState {{RECONCILING}}.
> If a job is started on a that JobManager for master recovery (tbd how to
> determine that) the {{ExecutionGraph}} and the {{Execution}}s start in the
> reconciling state.
> From {{RECONCILING}}, tasks can go to {{RUNNING}} (execution reconciled with
> TaskManager) or to {{FAILED}}.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)