[ 
https://issues.apache.org/jira/browse/TEZ-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555142#comment-14555142
 ] 

Hitesh Shah commented on TEZ-1273:
----------------------------------

Comments based on diagram v4. 

1) Should there be 2 events - RECOVER and RECOVER_FAILED to handle recovery 
errors?
2) No dag cleanup event handling in failed? 
3) register and unregister with RM are not states. Should they be?
4) dag_cleanup_event and new_dag_submitted_event were used as events to handle 
the dispatcher draining all events for a given dag before triggering cleanup. 
Any ideas on how to make them to be done as part of the transition instead of 
events to be handled? This can be a follow-up - not needed for this jira
5) running remains in running state on events such as internal error and 
shutdown - should a new terminating state be introduced?
6) Which services should be active and non active in the recovering state? e.g 
DagClientHandler?

Some things which might be useful to document: 

1) What happens when a shutdown signal is received
     - what happens in each different state?
     - what stateful info is tracked across transitions to finally shutdown?

2) Same questions as (1) for the following events: 
    - scheduling service error 
    - dag internal error
    - AM state machine internal error
    - dispatcher error - should this go through a shutdown hook or a call back 
to the AM state machine?







> Refactor DAGAppMaster to state machine based
> --------------------------------------------
>
>                 Key: TEZ-1273
>                 URL: https://issues.apache.org/jira/browse/TEZ-1273
>             Project: Apache Tez
>          Issue Type: Improvement
>    Affects Versions: 0.4.0
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: DAGAppMaster_3.pdf, DAGAppMaster_4.pdf, 
> TEZ-1273-3.patch, TEZ-1273-4.patch, TEZ-1273-5.patch, TEZ-1273-6.patch, 
> TEZ-1273-7.patch, Tez-1273-2.patch, Tez-1273.patch, dag_app_master.pdf, 
> dag_app_master2.pdf
>
>
> Almost all our entities (Vertex, Task etc) are state machine based and 
> written using a formal state machine. But DAGAppMaster is not written on a 
> formal state machine even though it has a state machine based behavior. This 
> jira is for refactoring it into state machine based



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to