[
https://issues.apache.org/jira/browse/OOZIE-994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13462188#comment-13462188
]
Virag Kothari commented on OOZIE-994:
-------------------------------------
I think its okay if ResumeXCommand doesn't know a way to distinguish between
workflow suspended by ActionCheckX or some other reason. If is sees the action
status as START_MANUAL, it can queue the ActionStartX. (clean up of actionDir
can be done before starting)
If you want all of the checks to be done against the wrapped exceptions first,
can we have
{code}
for (){
if( match (Exception.getcause()){
return new AEException("..") // Return immediately.
}
if (match (Exception)){
Exception e = new AEException ("..") //dont return immediately
}
}
if (e!=null){
return e;
}
> ActionCheckXCommand does not handle failures properly
> -----------------------------------------------------
>
> Key: OOZIE-994
> URL: https://issues.apache.org/jira/browse/OOZIE-994
> Project: Oozie
> Issue Type: Bug
> Components: workflow
> Affects Versions: 3.2.0
> Reporter: Alejandro Abdelnur
> Assignee: Robert Kanter
> Priority: Critical
> Fix For: trunk
>
> Attachments: OOZIE-994.patch, OOZIE-994.patch, OOZIE-994.patch
>
>
> If the JT restarts or dies and running jobs are lost or the JT is not
> reachable, Oozie ActionCheckXCommand will never fail the workflow job.
> There seem to be 2 issues here:
> * convertException is not receiving the root cause exception anytmore, but
> alway HadoopAccessorException wrapping the root cause exception. We should
> modify the convertException to inspect the cause exception as well.
> * ActionCheckXCommand does not do the handle retry logic of
> ActionStartXCommand.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira