[ https://issues.apache.org/jira/browse/TEZ-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14059303#comment-14059303 ]
Hitesh Shah commented on TEZ-1273: ---------------------------------- [~zjffdu] There are a couple of aspects to consider: - when should the AM unregister with the RM? - when should the AM do cleanup of its staging data/tmp resources? - when should the AM clean up DAG data of a completed/killed DAG? - what is the state flow when the AM receives a SIGTERM/kill signal? Do all signals translate into shutdowns? Other comments: - AM_REBOOT can be received at any point after the rm heartbeat service comes up. - Does a failure in recovery count as internal error? - Where does a dag submission fit in? Is it a state transition or just a state check? How do you plan to handle multiple concurrent dag submissions if its represented into a state transition event? Also, any thoughts on how can we capture session mode in the state machine itself so that we do not need isSession checks all over the place? > Refactor DAGAppMaster to state machine based > -------------------------------------------- > > Key: TEZ-1273 > URL: https://issues.apache.org/jira/browse/TEZ-1273 > Project: Apache Tez > Issue Type: Improvement > Affects Versions: 0.4.0 > Reporter: Jeff Zhang > Assignee: Jeff Zhang > Attachments: dag_app_master.pdf > > > Almost all our entities (Vertex, Task etc) are state machine based and > written using a formal state machine. But DAGAppMaster is not written on a > formal state machine even though it has a state machine based behavior. This > jira is for refactoring it into state machine based -- This message was sent by Atlassian JIRA (v6.2#6252)