[ 
https://issues.apache.org/jira/browse/TEZ-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712400#comment-14712400
 ] 

Jeff Zhang commented on TEZ-2745:
---------------------------------

bq. Should we just set max attempts to 1 ?
That would mean recovery won't work

bq. Any other error would likely be a non-recoverable error. Special casing 
ClassNotFoundException or other such future causes may be wasted effort?
If the exception is due to dag related components 
(EdgeManager/VertexManager/InputInitializer), it should just fail the dag but 
keep the tez session alive. If the exception is due to AM related components 
(DAGScheduer/HistoryServiceLoggging), it is not necessary to relaunch AM. ( It 
is the same that we don't need to launch another task attempt if the last task 
attempt is failed due to ClassNotFound ) 

bq. Or is the suggestion to convert ClassNotFoundException to 
AMUserCodeException and handle it like that?
I plan to add checked exception on method ReflectionUtils#createClazzInstance 
to allow the caller decide what to do






> ClassNotFoundException of user code should fail dag
> ---------------------------------------------------
>
>                 Key: TEZ-2745
>                 URL: https://issues.apache.org/jira/browse/TEZ-2745
>             Project: Apache Tez
>          Issue Type: Bug
>    Affects Versions: 0.7.0, 0.5.4, 0.6.2, 0.8.0-alpha
>            Reporter: Jeff Zhang
>            Assignee: Jeff Zhang
>
> This ClassNotFoundException is not captured now. The current behavior is AM 
> crashed and relaunched again until max app attempt is reached. 
> Here's user code used in AM:
> * EdgeManager
> * VertexManager
> * InputInitializer
> * OutputCommitter
> * Other user pluggable components (like DAGScheduler, HistoryServiceLogging 
> etc.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to