Github user zhijiangW commented on the issue:
https://github.com/apache/flink/pull/3340
@StephanEwen , thanks for professional reviews!
I attached the design doc in this jira for the implementation
consideration.
[https://docs.google.com/document/d/1rm3BYQyC-8GrLgdwSUAbkJ0aV8IaCneF6c5dk4G3SXY/edit](https://docs.google.com/document/d/1rm3BYQyC-8GrLgdwSUAbkJ0aV8IaCneF6c5dk4G3SXY/edit)
For your above suggestions:
1. I totally agree with your point of keeping the current final variables
related with **Execution** and **IntermediateResult**, etc. Although I already
tried to avoid concurrent issues for these variables in implementation, it is
still a good choice of creating the new ones to replace the previous ones, and
I am willing to re-implement the related logics for this good idea.
2. In the process of JobManagerRunner **grantLeadership**, considering not
expose the new job leader to TaskManagers before reconcile, so the
reconciliation process in JobManager will be called before
**confirmLeadership** to avoid this issue. As you suggest, it can also make
sense if creating JobManager and ExecutionGraph during reconciliation.
Considering minimum changes, I will keep the current mode for this issue if no
other reasons. What do you think?
3. Two phases of the reconciliation mainly concerns about some actions
during state transition. For example, if transition from **RECONCILING** to
**RUNNING**, it should **sendPartitionInfos** in previous logic and this action
relies on other upstream executions reconciliation. In other words, the
upstream execution reconciliation would try to cache the partition infos in the
**PartialInputChannelDeploymentDescriptor** of downstream execution. And when
the downstream execution switches to **RUNNING**, this partition info will be
sent to it. Because we can not confirm all the sequences of execution
reconciliation, dividing the whole process into two steps can make sense for
that.
But after your latest improvements for **EAGER** **ScheduleMode**, I think
the downstream task is submitted already with final upstream partition infos,
so maybe it is no need to send partition infos for running downstream task
again. If my understanding is correct, I can re-implement this part for merging
the current two phases into one based on TaskManager reports.
In addition , for FAILED or CANCELED reported execution states, it would
fail the whole ExecutionGraph as a result, so we can end the reconciliation
period immediately, then the following reports will be refused and the
corresponding tasks will be failed by TaskManagers. From JobManager side, when
process the first FAILED or CANCELED reported state to trigger failing the
ExecutionGraph, the other RECONCILING state executions will be transition to
CANCELED state directly, no need to send cancel rpc message. What do you think?
I will finish the above modifications during this week, and welcome any
other advices!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---