[ 
https://issues.apache.org/jira/browse/TEZ-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14715215#comment-14715215
 ] 

Siddharth Seth commented on TEZ-2745:
-------------------------------------

bq. If the exception is due to dag related components 
(EdgeManager/VertexManager/InputInitializer), it should just fail the dag but 
keep the tez session alive. If the exception is due to AM related components 
(HistoryServiceLoggging), it is not necessary to relaunch AM. ( It is the same 
that we don't need to launch another task attempt if the last task attempt is 
failed due to ClassNotFound )

That's ideal. Most failed DAGs should not cause the AM to die, which is what 
happens today. Do we need to ensure additional cleanup when attempting 
something like this. (This is similar to the case of re-using containers even 
after a task failed - that required ensuring that the task structures get 
cleaned up).

> ClassNotFoundException of user code should fail dag
> ---------------------------------------------------
>
>                 Key: TEZ-2745
>                 URL: https://issues.apache.org/jira/browse/TEZ-2745
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.5.4, 0.6.2, 0.8.0-alpha
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2745-1.patch, TEZ-2745-2.patch
>
>
> This ClassNotFoundException is not captured now. The current behavior is AM 
> crashed and relaunched again until max app attempt is reached. 
> Here's user code used in AM:
> * EdgeManager
> * VertexManager
> * InputInitializer
> * OutputCommitter
> * Other user pluggable components (like DAGScheduler, HistoryServiceLogging 
> etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to