Robert Kanter created OOZIE-2126:
------------------------------------

             Summary: SSH action can be too fast for Oozie sometimes
                 Key: OOZIE-2126
                 URL: https://issues.apache.org/jira/browse/OOZIE-2126
             Project: Oozie
          Issue Type: Bug
          Components: action
            Reporter: Robert Kanter
            Assignee: Robert Kanter


We've seen a timing problem with the SSH action where the callback comes back 
too fast, before the action has transitioned to RUNNING and is still in PREP.  
This causes Oozie to ignore the callback, which means it won't find out that 
the action completed until it manually checks (default=10min).  This happened 
in an HA setup, but I think it could happen even without HA.  Adding a 30 
second delay into the ssh scripts fixed the problem, but ideally we should come 
up with a better solution.

Here's the relevant logs:
{noformat}
2015-01-16 18:00:12,916 INFO org.apache.oozie.action.ssh.SshActionExecutor: 
SERVER[FOO] USER[foo] GROUP[-] TOKEN[] APP[${job_name}] 
JOB[0000027-150113223634420-oozie-oozi-W] 
ACTION[0000027-150113223634420-oozie-oozi-W@action-1] start() begins
2015-01-16 18:00:12,917 INFO org.apache.oozie.action.ssh.SshActionExecutor: 
SERVER[FOO] USER[foo] GROUP[-] TOKEN[] APP[${job_name}] 
JOB[0000027-150113223634420-oozie-oozi-W] 
ACTION[0000027-150113223634420-oozie-oozi-W@action-1] Attempting to copy ssh 
base scripts to remote host [[email protected]]
2015-01-16 18:00:15,769 INFO org.apache.oozie.servlet.CallbackServlet: 
SERVER[FOO] USER[-] GROUP[-] TOKEN[-] APP[-] 
JOB[0000027-150113223634420-oozie-oozi-W] 
ACTION[0000027-150113223634420-oozie-oozi-W@action-1] callback for action 
[0000027-150113223634420-oozie-oozi-W@action-1]
2015-01-16 18:00:15,774 ERROR 
org.apache.oozie.command.wf.CompletedActionXCommand: SERVER[FOO] USER[-] 
GROUP[-] TOKEN[] APP[-] JOB[0000027-150113223634420-oozie-oozi-W] 
ACTION[0000027-150113223634420-oozie-oozi-W@action-1] XException,
org.apache.oozie.command.CommandException: E0800: Action it is not running its 
in [PREP] state, action [0000027-150113223634420-oozie-oozi-W@action-1]
        at 
org.apache.oozie.command.wf.CompletedActionXCommand.eagerVerifyPrecondition(CompletedActionXCommand.java:77)
        at org.apache.oozie.command.XCommand.call(XCommand.java:251)
        at 
org.apache.oozie.service.CallableQueueService$CallableWrapper.run(CallableQueueService.java:174)
        at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
        at java.lang.Thread.run(Thread.java:745)
{noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to