[ 
https://issues.apache.org/jira/browse/TEZ-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716851#comment-14716851
 ] 

Siddharth Seth commented on TEZ-2745:
-------------------------------------

Making sure we clean up structures from the previously running DAG - all 
threads from initializers, VMs etc. Ensuring there's no pending operations - 
the AsyncDispatcher queue could have pending events which aren't processed 
since the DAG failed. Do we handle OutOfMemoryExceptions from user code 
correctly ?
Do we currently support running a DAG in an AM where the previous DAG failed ? 
Don't we fail the application right now ?

Ideally, a failed DAG should not mean the application fails - but that has been 
the model in the past. This needs to be changed for all scenarios (almost all 
DAG failures), rather than just this case.


> ClassNotFoundException of user code should fail dag
> ---------------------------------------------------
>
>                 Key: TEZ-2745
>                 URL: https://issues.apache.org/jira/browse/TEZ-2745
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.5.4, 0.6.2, 0.8.0-alpha
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>         Attachments: TEZ-2745-1.patch, TEZ-2745-2.patch
>
>
> This ClassNotFoundException is not captured now. The current behavior is AM 
> crashed and relaunched again until max app attempt is reached. 
> Here's user code used in AM:
> * EdgeManager
> * VertexManager
> * InputInitializer
> * OutputCommitter
> * Other user pluggable components (like DAGScheduler, HistoryServiceLogging 
> etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to