[ 
https://issues.apache.org/jira/browse/OOZIE-1065?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Virag Kothari updated OOZIE-1065:
---------------------------------

    Attachment: OOZIE-1065.patch

When rerun cmd is issued on killed coordinator, the bundle action pending flag 
is not reset. Hence the state transition of bundle is not happening. The patch 
calls the the parent of the killed coordinator to reset its pending flag.

Patch for review at
https://reviews.apache.org/r/8282/
                
> bundle status does not transit after rerun
> ------------------------------------------
>
>                 Key: OOZIE-1065
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1065
>             Project: Oozie
>          Issue Type: Bug
>          Components: bundle
>    Affects Versions: 3.3.0
>            Reporter: michelle chiang
>            Priority: Minor
>         Attachments: OOZIE-1065.patch
>
>
> 2 similar cases.
> 1. submit a bundle, with 3 coord jobs. kill coord-job-1.
>    coord-job-1 becomes KILLED with both actions KILLED.
>    the other 2 coord jobs finished SUCCEEDED. and bundle job is DONEWITHERROR.
>    rerun bundle job, -coordinator=coord-job-1. as soon as the rerun command 
> is issued, bundle job status is RUNNINGWITHERROR.
>    because coord-job-1 is in KILLED, it cannot be rerun.
>    but bundle job stays in RUNNINGWITHERROR when all 3 coord jobs in terminal 
> states (KILLED, SUCCEEDED, SUCCEEDED).
>    kill the bundle job. then bundle transit to KILLED for a second, then back 
> to RUNNINGWITHERROR.
> 2. submit a bundle, with 3 coord jobs. kill coord-job-1.
>    coord-job-1 becomes DONEWITHERROR with 1 action SUCCEEDED, and 1 action 
> KILLED.
>    the other 2 coord jobs finished SUCCEEDED. and bundle job is DONEWITHERROR.
>    rerun bundle job, -coordinator=coord-job-1. as soon as the rerun command 
> is issued, bundle job status is RUNNINGWITHERROR.
>    coord-job-1 is in RUNNING after rerun.
>    but bundle job stays in RUNNINGWITHERROR, and does not transit to RUNNING, 
> when 1 coord job RUNNING and other 2 coord job SUCCEEDED.
>    and bundle job stays in RUNNINGWITHERROR when all 3 coord jobs in terminal 
> states (DONEWITHERROR, SUCCEEDED, SUCCEEDED).
>    kill the bundle job. then bundle transit to KILLED for a second, then back 
> to RUNNINGWITHERROR.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to