[
https://issues.apache.org/jira/browse/TEZ-1273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14555142#comment-14555142
]
Hitesh Shah commented on TEZ-1273:
----------------------------------
Comments based on diagram v4.
1) Should there be 2 events - RECOVER and RECOVER_FAILED to handle recovery
errors?
2) No dag cleanup event handling in failed?
3) register and unregister with RM are not states. Should they be?
4) dag_cleanup_event and new_dag_submitted_event were used as events to handle
the dispatcher draining all events for a given dag before triggering cleanup.
Any ideas on how to make them to be done as part of the transition instead of
events to be handled? This can be a follow-up - not needed for this jira
5) running remains in running state on events such as internal error and
shutdown - should a new terminating state be introduced?
6) Which services should be active and non active in the recovering state? e.g
DagClientHandler?
Some things which might be useful to document:
1) What happens when a shutdown signal is received
- what happens in each different state?
- what stateful info is tracked across transitions to finally shutdown?
2) Same questions as (1) for the following events:
- scheduling service error
- dag internal error
- AM state machine internal error
- dispatcher error - should this go through a shutdown hook or a call back
to the AM state machine?
> Refactor DAGAppMaster to state machine based
> --------------------------------------------
>
> Key: TEZ-1273
> URL: https://issues.apache.org/jira/browse/TEZ-1273
> Project: Apache Tez
> Issue Type: Improvement
> Affects Versions: 0.4.0
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: DAGAppMaster_3.pdf, DAGAppMaster_4.pdf,
> TEZ-1273-3.patch, TEZ-1273-4.patch, TEZ-1273-5.patch, TEZ-1273-6.patch,
> TEZ-1273-7.patch, Tez-1273-2.patch, Tez-1273.patch, dag_app_master.pdf,
> dag_app_master2.pdf
>
>
> Almost all our entities (Vertex, Task etc) are state machine based and
> written using a formal state machine. But DAGAppMaster is not written on a
> formal state machine even though it has a state machine based behavior. This
> jira is for refactoring it into state machine based
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)