[ 
https://issues.apache.org/jira/browse/TEZ-1689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14182300#comment-14182300
 ] 

Siddharth Seth commented on TEZ-1689:
-------------------------------------

Comments on the patch.
- For all AMUserCodeExceptions thrown from edge events, it will be useful to 
include the source and destination vertices which are connected on the Edge. 
AMUserCodeException could have an additional diagnosticMessage field which 
could be populated and logged each time.
- This is a question. Should setParallelism be throwing Exception or 
AMUserCodeException ? 
- edge.setCustomEdgeManager - returning false. This isn't very useful in 
general, since none of the VertexManagers actually do anything with the return 
value. Instead, we should just throw a RuntimeException - which would 
eventually result in a VertexManager failure and will result in the DAG failing.
- Nit; {code}+          boolean successSetParallelism2; {code} Use 
successSetParallelism after reset.
- DAGImpl changes - error reporting should be using getCause()
- // ingore this exception, should not happen| If this is meant to be ignored, 
the log line should indicate this. Unexpected error, etc. 
- {code}+  // this method would been called by all the task attempts,
+  // should always throw exception in the subsequent task attempt if
+  // exception happen in the first task attempt.{code}
Should the exception be sent to all tasks if there was routing failure for one 
of them ? Instead, should we just let the single task for which there was an 
error FAIL the DAG via regular state machine semantics ? That isolates the one 
which caused the failure as well.

On the InputInitializer, 
- For handleInputInitializerEvent - this should be fairly straightfoward to 
handle. At the moment this is an inline call from within the AsyncDispatcher, 
and will end up causing a RuntimeException. The RuntimeException can be changed 
to a AMUserCodeException which will take care of this.
- For onVertexStateUpdated, this eventually gets invoked from within 
RootInputInitializerManager. Catching exceptions there and sending a 
RootInputInitialzierFailedEvent should be enough to fix this ? May require some 
state machine changes to handle this event on a few more states.
I think it'll be better to do the initializer changes in a separate jira, so 
that this can go in quickly (easier to review subsequent patches from the 
current patch :))

> Exception handling for EdgeManagePlugin and InputInitializer
> ------------------------------------------------------------
>
>                 Key: TEZ-1689
>                 URL: https://issues.apache.org/jira/browse/TEZ-1689
>             Project: Apache Tez
>          Issue Type: Sub-task
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>            Priority: Critical
>         Attachments: TEZ-1689.patch
>
>
> EdgeManagePlugin and InputInitializer are both user code which could lead 
> exception, we should handle it, fail the DAG and display the exception in 
> client side.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to