[
https://issues.apache.org/jira/browse/TEZ-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14716851#comment-14716851
]
Siddharth Seth commented on TEZ-2745:
-------------------------------------
Making sure we clean up structures from the previously running DAG - all
threads from initializers, VMs etc. Ensuring there's no pending operations -
the AsyncDispatcher queue could have pending events which aren't processed
since the DAG failed. Do we handle OutOfMemoryExceptions from user code
correctly ?
Do we currently support running a DAG in an AM where the previous DAG failed ?
Don't we fail the application right now ?
Ideally, a failed DAG should not mean the application fails - but that has been
the model in the past. This needs to be changed for all scenarios (almost all
DAG failures), rather than just this case.
> ClassNotFoundException of user code should fail dag
> ---------------------------------------------------
>
> Key: TEZ-2745
> URL: https://issues.apache.org/jira/browse/TEZ-2745
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0, 0.5.4, 0.6.2, 0.8.0-alpha
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
> Attachments: TEZ-2745-1.patch, TEZ-2745-2.patch
>
>
> This ClassNotFoundException is not captured now. The current behavior is AM
> crashed and relaunched again until max app attempt is reached.
> Here's user code used in AM:
> * EdgeManager
> * VertexManager
> * InputInitializer
> * OutputCommitter
> * Other user pluggable components (like DAGScheduler, HistoryServiceLogging
> etc.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)