Github user zhijiangW commented on the issue:

    https://github.com/apache/flink/pull/3340
  
    @StephanEwen , thanks for professional reviews!
    
    I attached the design doc in this jira for the implementation 
consideration. 
    
[https://docs.google.com/document/d/1rm3BYQyC-8GrLgdwSUAbkJ0aV8IaCneF6c5dk4G3SXY/edit](https://docs.google.com/document/d/1rm3BYQyC-8GrLgdwSUAbkJ0aV8IaCneF6c5dk4G3SXY/edit)
    
    For your above suggestions:
    
    1.  I totally agree with your point of keeping the current final variables 
related with **Execution** and **IntermediateResult**, etc. Although I already 
tried to avoid concurrent issues for these variables in implementation, it is 
still a good choice of creating the new ones to replace the previous ones, and 
I am willing to re-implement the related logics for this good idea.
    
    2. In the process of JobManagerRunner **grantLeadership**, considering not 
expose the new job leader to TaskManagers before reconcile, so the 
reconciliation process in JobManager will be called before 
**confirmLeadership** to avoid this issue. As you suggest, it can also make 
sense if creating JobManager and ExecutionGraph during reconciliation. 
Considering minimum changes, I will keep the current mode for this issue if no 
other reasons. What do you think?
    
    3. Two phases of the reconciliation mainly concerns about some actions 
during state transition. For example, if transition from **RECONCILING** to 
**RUNNING**, it should **sendPartitionInfos** in previous logic and this action 
relies on other upstream executions reconciliation. In other words, the 
upstream execution reconciliation would try to cache the partition infos in the 
**PartialInputChannelDeploymentDescriptor** of downstream execution. And when 
the downstream execution switches to **RUNNING**, this partition info will be 
sent to it. Because we can not confirm all the sequences of execution 
reconciliation, dividing the whole process into two steps can make sense for 
that. 
    But after your latest improvements for **EAGER** **ScheduleMode**, I think 
the downstream task is submitted already with final upstream partition infos, 
so maybe it is no need to send partition infos for running downstream task 
again. If my understanding is correct, I can re-implement this part for merging 
the current two phases into one based on TaskManager reports.
    In addition , for FAILED or CANCELED reported execution states, it would 
fail the whole ExecutionGraph as a result, so we can end the reconciliation 
period immediately, then the following reports will be refused and the 
corresponding tasks will be failed by TaskManagers. From JobManager side, when 
process the first FAILED or CANCELED reported state to trigger failing the 
ExecutionGraph, the other RECONCILING state executions will be transition to 
CANCELED state directly, no need to send cancel rpc message. What do you think?
    
    I will finish the above modifications during this week, and welcome any 
other advices! 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to