Attila Sasvari created OOZIE-2985:
-------------------------------------

             Summary: If LauncherAM fails, Oozie is not notified in a timely 
manner
                 Key: OOZIE-2985
                 URL: https://issues.apache.org/jira/browse/OOZIE-2985
             Project: Oozie
          Issue Type: Bug
            Reporter: Attila Sasvari


I've noticed if LauncherAM fails, Oozie is notified about the launcher's 
failure with a lot of delay. It gives the impression that the workflow is 
running.

{{oozie job -oozie http://localhost:11000/oozie -config 
examples/apps/datelist-java-main/job.properties  -info  
0000000-170712153835057-oozie-asas-W}}
{code}
0000000-170712153835057-oozie-asas-W@java1                                    
RUNNING   application_1499866588585_0001RUNNING    -         
{code}

I've looked at yarn logs for the launcher and seen that the launcher failed. 
For example, in my case , during development, oozie-sharelib launcher was not 
found:  
{code}
Error: Could not find or load main class 
org.apache.oozie.action.hadoop.LauncherAM
{code}

The problem is only after the specified timeout (by default 10 minutes) we see 
that the workflow has actually failed /errored.

{code}
Created       : 2017-07-12 13:38 GMT
Started       : 2017-07-12 13:38 GMT
Last Modified : 2017-07-12 13:49 GMT
...
0000000-170712153835057-oozie-asas-W@java1                                    
ERROR     application_1499866588585_0001FAILED/KILLED-         
{code} 

The problem might be that in {{JavaActionExecutor}} in the {{start()}} method 
the check is too fast.

{code}
LOG.debug("Starting action " + action.getId() + " getting Action File System");
            FileSystem actionFs = context.getAppFileSystem();
            LOG.debug("Preparing action Dir through copying " + 
context.getActionDir());
            prepareActionDir(actionFs, context);
            LOG.debug("Action Dir is ready. Submitting the action ");
            submitLauncher(actionFs, context, action);
            LOG.debug("Action submit completed. Performing check ");
            check(context, action);
            LOG.debug("Action check is done after submission
{code}

There should be some delay after {{submitLauncher()}} before {{check()}}.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to