[ 
https://issues.apache.org/jira/browse/OOZIE-3156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16496381#comment-16496381
 ] 

Andras Piros commented on OOZIE-3156:
-------------------------------------

Thanks for the new patch [~txsing]!

Following is the next round on comments:
* {{SshActionExecutor#handleRetry()}}: {{sleepBeforeRetryMs /= 2;}} should 
rather be {{sleepBeforeRetryMs *= 2;}}
* the return value of {{SshActionExecutor#handleRetry()}} is not reused in 
caller code, so it doesn't get really an exponential backoff - {{initWaitTime}} 
will always be reused
* in {{TestSshActionExecutor#testSshCheckWithHostConnectFailure()}} it's 
unclear to me whether {{echo "prop1=something"}} would always fail for the 
first time. We need to inject failure somehow to be on the safe side, or, if 
already present, extract methods of the test case w/ appropriate names to know 
what's going on
* extending {{DG_SshActionExtension.twiki}} goes into the right direction. 
Still, we need to introduce 
{{oozie-default.xml#oozie.action.ssh.check.retries.max}} with the default value 
{{3}}, and mention it also in the docs

> SSH action status turns OK wrongly when failed to connect to host
> -----------------------------------------------------------------
>
>                 Key: OOZIE-3156
>                 URL: https://issues.apache.org/jira/browse/OOZIE-3156
>             Project: Oozie
>          Issue Type: Bug
>          Components: action
>    Affects Versions: 5.0.0
>            Reporter: TIAN XING
>            Assignee: TIAN XING
>            Priority: Major
>         Attachments: OOZIE-3156-v1.patch, OOZIE-3156-v2.patch, 
> OOZIE-3156-v3.patch, ssh-check-bug.patch
>
>
> When {{check()}} method of {{SshActionExecutor}} gets invoked, oozie will ssh 
> connect to the host and check whether the pid of the process that ssh action 
> started is still there (by checking the returned value of command "{{ssh 
> <host-ip> ps -p <pid>}}" ) to determine whether ssh action completes or not.
> However, we found cases where oozie fails to connect to host during action 
> status check (e.g., the host is under heavy load, or network is bad etc.).
> In such cases, the return value of command "{{ssh <host-ip> ps -p <pid>}}" 
> will be 255 (ssh command exits with the exit status of the remote command or 
> with 255 if an error occurred.).
> According the current logic of method {{getActionStatus()}} in 
> {{SshActionExecutor}}, the action status will be determined as OK which may 
> not be correct. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to