[ 
https://issues.apache.org/jira/browse/TEZ-1267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14179836#comment-14179836
 ] 

Jeff Zhang commented on TEZ-1267:
---------------------------------

[~sseth] Thanks for your review. I attach a new patch with the following 
changes.

* Wrap the tez events handling in method handleRoutedTezEvents() and call it in 
RouteEventTransition. Make the method static for less change. No lock around 
this method since it is always used in state machine thread exception in 
scheduleTasks which already has lock
* Create a AMUserCodeException to wrap all the exception from AM side.  BTW, 
OutputComitter is another place that will throw exception, will add it in 
TEZ-1689
* Report back to client VertexManagerException.getCause(), but log the origin 
exception
* Verify the Termination cause in unin test.
* Fix the minor stuff in code 
** Remove commented out line in TestExceptionPropagation
** Typo - "VertexManageEvent"
** s/Exception happen(s)/Exception in/

bq. Is the entire stack trace too much information for the diagnostic message, 
or is that standard practice for other cases as well ?
Also use the stack track in error from I/P/O, from user's feedback, they want 
to see more context information about the error.


> Exception handling when Routing Events
> --------------------------------------
>
>                 Key: TEZ-1267
>                 URL: https://issues.apache.org/jira/browse/TEZ-1267
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Siddharth Seth
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: TEZ-1267-2.patch, TEZ-1267-3.patch, TEZ-1267-4.patch, 
> Tez-1267.patch
>
>
> Events are generated by user code. In some places they're also handled by 
> user code within the AM. Currently, exceptions which are generated when 
> handling user code will end up killing the AM (and hence leading to a retry).
> Instead, failure to handle such events, should cause the application to fail.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to