[ https://issues.apache.org/jira/browse/TEZ-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14065514#comment-14065514 ]
Jeff Zhang commented on TEZ-1273: --------------------------------- Update the state machine and attach the patch changes of state machine: * Separate session and non-session into different states. Session States: SESSION_IDLE, SESSION_RUNNING Non-Session States: DAG_RUNNING, DAG_SUCCEED, DAG_KILLED, DAG_FAILED * Add an intermediate state for terminate: TERMINATING. Separate the terminate into 2 transitions: START_TERMINATE, FINAL_TERMINATE. ** In the START_TERMINATE, it will whether there's DAG running, send DAG_KILL first if there's one, otherwise, go to FINAL_TERMINATE transition. In this stage, we could decide whether could do cleanup. Cleanup case : kill from client side No-cleanup case: INTERNAL_ERROR, AM_REBOOT ( Won't do cleanup in the ShutDownhook ) ** In the FINAL_TERMINATE, it could do the cleanup if necessary (leave it as placeholder, there's another ticket tracking for this ) and stop all the services. Run the MRRSleep in local cluster successfully in session mode, non-session mode and with recovering. [~hitesh] Please help review it, will add Unit Test later. > Refactor DAGAppMaster to state machine based > -------------------------------------------- > > Key: TEZ-1273 > URL: https://issues.apache.org/jira/browse/TEZ-1273 > Project: Apache Tez > Issue Type: Improvement > Affects Versions: 0.4.0 > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Attachments: Tez-1273.patch, dag_app_master.pdf, dag_app_master2.pdf > > > Almost all our entities (Vertex, Task etc) are state machine based and > written using a formal state machine. But DAGAppMaster is not written on a > formal state machine even though it has a state machine based behavior. This > jira is for refactoring it into state machine based -- This message was sent by Atlassian JIRA (v6.2#6252)