[ 
https://issues.apache.org/jira/browse/OOZIE-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894343#comment-13894343
 ] 

Rohini Palaniswamy commented on OOZIE-1687:
-------------------------------------------

To elaborate more on this particular case, bundle action has status set to 
killed and pending set to 1 and coord kill is queued. Before coord kill could 
execute, status transit executes and sets the status of coord job to 
DONEWITHERROR or SUCCEEDED (race condition) as the coord job indeed does 
succeed (at the same time the bundle gets killed) and calls 
BundleStatusUpdatedCommand. Since the precondition check fails as the bundle 
status (KILLED) and prev coord status (RUNNING) are not same and also bundle 
status is not same as current coord status (SUCCEEDED) the bundle action 
pending is left at 1 and bundle stays in RUNNING even though all the bundle 
actions/coord jobs are completed.  Another slight variant of this case, is that 
CoordKill happens, but StatusTransitService for coord job also runs at the same 
time and has RUNNING job status for coord job in memory even though it was 
already killed. Subsequent bundle kills don't do anything as all bundle actions 
are in terminal state even though some of their pending is at 1. Only way to 
get the bundle to a KILLED state is to update the pending to 0 for the bundle 
actions.

 This jira fixes the case where the bundle pending is not left at 1 in the 
first place. Even if it was left, it ensures that new issues of bundle kill 
command will decrement the pending and get the bundle to a KILLED state. 

Shwetha,
   Do you face such an issue with suspend? If so can you open another jira for 
it?

> Bundle can still be in RUNNINGWITHERROR status after bundle kill
> ----------------------------------------------------------------
>
>                 Key: OOZIE-1687
>                 URL: https://issues.apache.org/jira/browse/OOZIE-1687
>             Project: Oozie
>          Issue Type: Bug
>    Affects Versions: 4.0.0
>            Reporter: Rohini Palaniswamy
>            Assignee: Rohini Palaniswamy
>             Fix For: trunk
>
>         Attachments: OOZIE-1687-1.patch
>
>
> Race condition between StatusTransitService (does not acquire lock) and 
> BundleKillCommand can leave a bundle action with pending=1 and terminal 
> status. 



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

Reply via email to