[ 
https://issues.apache.org/jira/browse/TEZ-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059303#comment-14059303
 ] 

Hitesh Shah commented on TEZ-1273:
----------------------------------

[~zjffdu] There are a couple of aspects to consider: 
    - when should the AM unregister with the RM?
    - when should the AM do cleanup of its staging data/tmp resources?
    - when should the AM clean up DAG data of a completed/killed DAG?
    - what is the state flow when the AM receives a SIGTERM/kill signal? Do all 
signals translate into shutdowns?
   
Other comments:
   - AM_REBOOT can be received at any point after the rm heartbeat service 
comes up. 
   - Does a failure in recovery count as internal error? 
   - Where does a dag submission fit in? Is it a state transition or just a 
state check? How do you plan to handle multiple concurrent dag submissions if 
its represented into a state transition event?

Also, any thoughts on how can we capture session mode in the state machine 
itself so that we do not need isSession checks all over the place? 


 



> Refactor DAGAppMaster to state machine based
> --------------------------------------------
>
>                 Key: TEZ-1273
>                 URL: https://issues.apache.org/jira/browse/TEZ-1273
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: dag_app_master.pdf
>
>
> Almost all our entities (Vertex, Task etc) are state machine based and 
> written using a formal state machine. But DAGAppMaster is not written on a 
> formal state machine even though it has a state machine based behavior. This 
> jira is for refactoring it into state machine based



--
This message was sent by Atlassian JIRA
(v6.2#6252)

Reply via email to