[
https://issues.apache.org/jira/browse/OOZIE-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
WangMeng updated OOZIE-2568:
----------------------------
Description:
There is a bug in automaticly retry of SSH action :
For example:
I have configed the following retry property :
{code}
<name>oozie.service.LiteWorkflowStoreService.user.retry.error.code.ext</name>
<value>ALL</value>
{code}
And my SSH action is :
{code}
<action name="ssh-afbb" retry-max="3" retry-interval="1">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>wangmeng@XXXX</host>
<command>sh /data/wangmeng/hue_sh4.sh</command>
<capture-output/>
</ssh>
<ok to="End"/>
<error to="Kill"/>
</action>
{code}
When this action failed,it pretends to retry automaticly according to logs.
Such as :
{code}
Start action [0000000-160612140701137-oozie-oozi-W@ssh-afbb] with user-retry
state : userRetryCount [1], userRetryMax [3], userRetryInterval [1]
{code}
However, it does not actually re-run.
This reason is : if the previous PID exists in XXXX.pid file of this SSH
action’s log dir , without checking this PID process is finished or not , SSH
action will not launch a new process to rerun. And in my tests , I find this
PID process have finished when Oozie rerun this action.
was:
There is a bug in automaticly retry of SSH action :
For example:
I have configed the following retry property :
{code}
<name>oozie.service.LiteWorkflowStoreService.user.retry.error.code.ext</name>
<value>ALL</value>
{code}
And my SSH action is :
{code}
<action name="ssh-afbb" retry-max="3" retry-interval="1">
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>wangmeng@XXXX</host>
<command>sh /data/wangmeng/hue_sh4.sh</command>
<capture-output/>
</ssh>
<ok to="End"/>
<error to="Kill"/>
</action>
{code}
Howerver, when this action failed,it pretends to retry automaticly according to
logs ,such as :
{code}
Start action [0000000-160612140701137-oozie-oozi-W@ssh-afbb] with user-retry
state : userRetryCount [1], userRetryMax [3], userRetryInterval [1]
{code}
However, it does not actually re-run.
This reason is : when the previous PID exists in XXXX.pid file of this SSH
action’s log dir , no matter this pid process is finished or not , SSH action
will not launch a new process to rerun.
> SSH action can not retry automaticly when it failed
> -----------------------------------------------------
>
> Key: OOZIE-2568
> URL: https://issues.apache.org/jira/browse/OOZIE-2568
> Project: Oozie
> Issue Type: Bug
> Components: core
> Affects Versions: 4.2.0
> Reporter: WangMeng
> Attachments: OOZIE-2568.01.patch
>
>
> There is a bug in automaticly retry of SSH action :
> For example:
> I have configed the following retry property :
> {code}
> <name>oozie.service.LiteWorkflowStoreService.user.retry.error.code.ext</name>
> <value>ALL</value>
> {code}
> And my SSH action is :
> {code}
> <action name="ssh-afbb" retry-max="3" retry-interval="1">
> <ssh xmlns="uri:oozie:ssh-action:0.1">
> <host>wangmeng@XXXX</host>
> <command>sh /data/wangmeng/hue_sh4.sh</command>
> <capture-output/>
> </ssh>
> <ok to="End"/>
> <error to="Kill"/>
> </action>
> {code}
> When this action failed,it pretends to retry automaticly according to logs.
> Such as :
> {code}
> Start action [0000000-160612140701137-oozie-oozi-W@ssh-afbb] with user-retry
> state : userRetryCount [1], userRetryMax [3], userRetryInterval [1]
> {code}
> However, it does not actually re-run.
> This reason is : if the previous PID exists in XXXX.pid file of this SSH
> action’s log dir , without checking this PID process is finished or not , SSH
> action will not launch a new process to rerun. And in my tests , I find this
> PID process have finished when Oozie rerun this action.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)