[ 
https://issues.apache.org/jira/browse/OOZIE-2985?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16218807#comment-16218807
 ] 

Denes Bodo commented on OOZIE-2985:
-----------------------------------

[~pbacsko] Am I right that Oozie can provide a notification URL to the job that 
can call back to Oozie? Can we use the same method in LauncherAM to notify when 
the main() starts running?

> If LauncherAM fails, Oozie is not notified in a timely manner
> -------------------------------------------------------------
>
>                 Key: OOZIE-2985
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2985
>             Project: Oozie
>          Issue Type: Bug
>            Reporter: Attila Sasvari
>
> I've noticed if LauncherAM fails, Oozie is notified about the launcher's 
> failure with a lot of delay. It gives the impression that the workflow is 
> running.
> {{oozie job -oozie http://localhost:11000/oozie -config 
> examples/apps/datelist-java-main/job.properties  -info  
> 0000000-170712153835057-oozie-asas-W}}
> {code}
> 0000000-170712153835057-oozie-asas-W@java1                                    
> RUNNING   application_1499866588585_0001RUNNING    -         
> {code}
> I've looked at yarn logs for the launcher and seen that the launcher failed. 
> For example, in my case , during development, oozie-sharelib launcher was not 
> found:  
> {code}
> Error: Could not find or load main class 
> org.apache.oozie.action.hadoop.LauncherAM
> {code}
> The problem is only after the specified timeout (by default 10 minutes) we 
> see that the workflow has actually failed /errored.
> {code}
> Created       : 2017-07-12 13:38 GMT
> Started       : 2017-07-12 13:38 GMT
> Last Modified : 2017-07-12 13:49 GMT
> ...
> 0000000-170712153835057-oozie-asas-W@java1                                    
> ERROR     application_1499866588585_0001FAILED/KILLED-         
> {code} 
> The problem might be that in {{JavaActionExecutor}} in the {{start()}} method 
> the check is too fast.
> {code}
> LOG.debug("Starting action " + action.getId() + " getting Action File 
> System");
>             FileSystem actionFs = context.getAppFileSystem();
>             LOG.debug("Preparing action Dir through copying " + 
> context.getActionDir());
>             prepareActionDir(actionFs, context);
>             LOG.debug("Action Dir is ready. Submitting the action ");
>             submitLauncher(actionFs, context, action);
>             LOG.debug("Action submit completed. Performing check ");
>             check(context, action);
>             LOG.debug("Action check is done after submission
> {code}
> There should be some delay after {{submitLauncher()}} before {{check()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to