[ 
https://issues.apache.org/jira/browse/TEZ-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14177986#comment-14177986
 ] 

Jeff Zhang commented on TEZ-1267:
---------------------------------

bq. ROUTE_EVENT_TRANSITIONS from the NEW / INITIALIZING / INITED state. This, 
generally, will not process the relevant event - except for the case of 
VertexManagerEvents. I think it's safe to leave the change as is - i.e. allow a 
transition into FAILED state, but I don't think it'll be triggered, at least 
not for NEW. VertexManagers aren't setup before the NEW state - and we likely 
need to put in checks for this before - as part of a separate jira.
Only allow transition into FAILED state from  INITIALIZING / INITED. create 
[TEZ-1691|https://issues.apache.org/jira/browse/TEZ-1691] for checking 
VertexManager before handling VertexManageEvent.

bq. SOURCE_TASK_ATTEMPT_COMPLETE while in NEW / INITIALIZING / INITED - the 
events will always be cached, and hence should not lead to problems. We could 
choose to leave these transitions as is, or allow the FAILED state.
leave these transition as is, no FAILED state transition

bq. Transition from INITED to RUNNING - it's possible for the VertexManager to 
have scheduled tasks before generating an error. In such cases, I think we need 
to try killing any invoked tasks - rather than transitioning directly to FAILED 
state. (Technically, the schedule could be in any of the states - but it 
shouldn't be used before the vertex starts.
kill task and transite to TERMINATING if exception happens in INITED to RUNNING.

bq. Minor: Log messages say ", vertexId=" + vertex.logIdentifier + ",". This 
should just be vertex instead of vertexId, since logIndetifier contains both.
Change it to "vertex"

bq. These changes are required for the EdgePlugins as well, and also for 
InputInitializers. I think we should get this in, and handle the Edges and 
InputInitializers in a separate jira.
create jira [TEZ-1689|https://issues.apache.org/jira/browse/TEZ-1689] to track 
exception handling in Edges and InputInitializers.


Besides, make the following changes:
* Move some common logic into finish(). like add diagnostics, shutdown 
rootInputInitializerManager


> Exception handling when Routing Events
> --------------------------------------
>
>                 Key: TEZ-1267
>                 URL: https://issues.apache.org/jira/browse/TEZ-1267
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: Tez-1267.patch
>
>
> Events are generated by user code. In some places they're also handled by 
> user code within the AM. Currently, exceptions which are generated when 
> handling user code will end up killing the AM (and hence leading to a retry).
> Instead, failure to handle such events, should cause the application to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to