[ 
https://issues.apache.org/jira/browse/TEZ-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177597#comment-14177597
 ] 

Siddharth Seth commented on TEZ-1267:
-------------------------------------

Structurally, I think this looks good. There are some things that need to be 
considered though
- ROUTE_EVENT_TRANSITIONS from the NEW / INITIALIZING / INITED state. This, 
generally, will not process the relevant event - except for the case of 
VertexManagerEvents. I think it's safe to leave the change as is - i.e. allow a 
transition into FAILED state, but I don't think it'll be triggered, at least 
not for NEW. VertexManagers aren't setup before the NEW state - and we likely 
need to put in checks for this before - as part of a separate jira.
- SOURCE_TASK_ATTEMPT_COMPLETE while in NEW / INITIALIZING / INITED - the 
events will always be cached, and hence should not lead to problems. We could 
choose to leave these transitions as is, or allow the FAILED state.
- Transition from INITED to RUNNING - it's possible for the VertexManager to 
have scheduled tasks before generating an error. In such cases, I think we need 
to try killing any invoked tasks - rather than transitioning directly to FAILED 
state. (Technically, the schedule could be in any of the states - but it 
shouldn't be used before the vertex starts.
- Minor: Log messages say ", vertexId=" + vertex.logIdentifier + ",". This 
should just be vertex instead of vertexId, since logIndetifier contains both.

These changes are required for the EdgePlugins as well, and also for 
InputInitializers. I think we should get this in, and handle the Edges and 
InputInitializers in a separate jira.

> Exception handling when Routing Events
> --------------------------------------
>
>                 Key: TEZ-1267
>                 URL: https://issues.apache.org/jira/browse/TEZ-1267
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: Tez-1267.patch
>
>
> Events are generated by user code. In some places they're also handled by 
> user code within the AM. Currently, exceptions which are generated when 
> handling user code will end up killing the AM (and hence leading to a retry).
> Instead, failure to handle such events, should cause the application to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to