[
https://issues.apache.org/jira/browse/TEZ-2745?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14712400#comment-14712400
]
Jeff Zhang commented on TEZ-2745:
---------------------------------
bq. Should we just set max attempts to 1 ?
That would mean recovery won't work
bq. Any other error would likely be a non-recoverable error. Special casing
ClassNotFoundException or other such future causes may be wasted effort?
If the exception is due to dag related components
(EdgeManager/VertexManager/InputInitializer), it should just fail the dag but
keep the tez session alive. If the exception is due to AM related components
(DAGScheduer/HistoryServiceLoggging), it is not necessary to relaunch AM. ( It
is the same that we don't need to launch another task attempt if the last task
attempt is failed due to ClassNotFound )
bq. Or is the suggestion to convert ClassNotFoundException to
AMUserCodeException and handle it like that?
I plan to add checked exception on method ReflectionUtils#createClazzInstance
to allow the caller decide what to do
> ClassNotFoundException of user code should fail dag
> ---------------------------------------------------
>
> Key: TEZ-2745
> URL: https://issues.apache.org/jira/browse/TEZ-2745
> Project: Apache Tez
> Issue Type: Bug
> Affects Versions: 0.7.0, 0.5.4, 0.6.2, 0.8.0-alpha
> Reporter: Jeff Zhang
> Assignee: Jeff Zhang
>
> This ClassNotFoundException is not captured now. The current behavior is AM
> crashed and relaunched again until max app attempt is reached.
> Here's user code used in AM:
> * EdgeManager
> * VertexManager
> * InputInitializer
> * OutputCommitter
> * Other user pluggable components (like DAGScheduler, HistoryServiceLogging
> etc.)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)