Attila Sasvari created OOZIE-2985:
-------------------------------------
Summary: If LauncherAM fails, Oozie is not notified in a timely
manner
Key: OOZIE-2985
URL: https://issues.apache.org/jira/browse/OOZIE-2985
Project: Oozie
Issue Type: Bug
Reporter: Attila Sasvari
I've noticed if LauncherAM fails, Oozie is notified about the launcher's
failure with a lot of delay. It gives the impression that the workflow is
running.
{{oozie job -oozie http://localhost:11000/oozie -config
examples/apps/datelist-java-main/job.properties -info
0000000-170712153835057-oozie-asas-W}}
{code}
0000000-170712153835057-oozie-asas-W@java1
RUNNING application_1499866588585_0001RUNNING -
{code}
I've looked at yarn logs for the launcher and seen that the launcher failed.
For example, in my case , during development, oozie-sharelib launcher was not
found:
{code}
Error: Could not find or load main class
org.apache.oozie.action.hadoop.LauncherAM
{code}
The problem is only after the specified timeout (by default 10 minutes) we see
that the workflow has actually failed /errored.
{code}
Created : 2017-07-12 13:38 GMT
Started : 2017-07-12 13:38 GMT
Last Modified : 2017-07-12 13:49 GMT
...
0000000-170712153835057-oozie-asas-W@java1
ERROR application_1499866588585_0001FAILED/KILLED-
{code}
The problem might be that in {{JavaActionExecutor}} in the {{start()}} method
the check is too fast.
{code}
LOG.debug("Starting action " + action.getId() + " getting Action File System");
FileSystem actionFs = context.getAppFileSystem();
LOG.debug("Preparing action Dir through copying " +
context.getActionDir());
prepareActionDir(actionFs, context);
LOG.debug("Action Dir is ready. Submitting the action ");
submitLauncher(actionFs, context, action);
LOG.debug("Action submit completed. Performing check ");
check(context, action);
LOG.debug("Action check is done after submission
{code}
There should be some delay after {{submitLauncher()}} before {{check()}}.
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)