[
https://issues.apache.org/jira/browse/OOZIE-1687?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13894343#comment-13894343
]
Rohini Palaniswamy commented on OOZIE-1687:
-------------------------------------------
To elaborate more on this particular case, bundle action has status set to
killed and pending set to 1 and coord kill is queued. Before coord kill could
execute, status transit executes and sets the status of coord job to
DONEWITHERROR or SUCCEEDED (race condition) as the coord job indeed does
succeed (at the same time the bundle gets killed) and calls
BundleStatusUpdatedCommand. Since the precondition check fails as the bundle
status (KILLED) and prev coord status (RUNNING) are not same and also bundle
status is not same as current coord status (SUCCEEDED) the bundle action
pending is left at 1 and bundle stays in RUNNING even though all the bundle
actions/coord jobs are completed. Another slight variant of this case, is that
CoordKill happens, but StatusTransitService for coord job also runs at the same
time and has RUNNING job status for coord job in memory even though it was
already killed. Subsequent bundle kills don't do anything as all bundle actions
are in terminal state even though some of their pending is at 1. Only way to
get the bundle to a KILLED state is to update the pending to 0 for the bundle
actions.
This jira fixes the case where the bundle pending is not left at 1 in the
first place. Even if it was left, it ensures that new issues of bundle kill
command will decrement the pending and get the bundle to a KILLED state.
Shwetha,
Do you face such an issue with suspend? If so can you open another jira for
it?
> Bundle can still be in RUNNINGWITHERROR status after bundle kill
> ----------------------------------------------------------------
>
> Key: OOZIE-1687
> URL: https://issues.apache.org/jira/browse/OOZIE-1687
> Project: Oozie
> Issue Type: Bug
> Affects Versions: 4.0.0
> Reporter: Rohini Palaniswamy
> Assignee: Rohini Palaniswamy
> Fix For: trunk
>
> Attachments: OOZIE-1687-1.patch
>
>
> Race condition between StatusTransitService (does not acquire lock) and
> BundleKillCommand can leave a bundle action with pending=1 and terminal
> status.
--
This message was sent by Atlassian JIRA
(v6.1.5#6160)