[
https://issues.apache.org/jira/browse/OOZIE-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189640#comment-15189640
]
Ferenc Denes commented on OOZIE-2459:
-------------------------------------
As I can see this test has 4 subtests in there.
At some failed occasions' log shows, that the Coordinator action goes to
SUCCESS somewhere between or near the 1st and the second subtest. From that on
the precondition check fails:
...
08:38:08,753 INFO CoordActionNotificationXCommand:520 - USER[-] GROUP[-]
TOKEN[-] APP[-] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@1] No Notification URL is defined.
Therefore nothing to notify for job 0000000-160304083808369-oozie-jenk-C@1
08:38:09,626 DEBUG CoordActionInputCheckXCommand:526 - USER[test] GROUP[-]
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@2] Acquired lock for
[0000000-160304083808369-oozie-jenk-C] in [coord_action_input]
08:38:09,628 DEBUG CoordActionInputCheckXCommand:526 - USER[test] GROUP[-]
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@2] Released lock for
[0000000-160304083808369-oozie-jenk-C] in [coord_action_input]
08:38:09,628 WARN CoordActionInputCheckXCommand:523 - USER[test] GROUP[-]
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@2] E1100: Command precondition does
not hold before execution,
[[0000000-160304083808369-oozie-jenk-C@2]::CoordActionInputCheck:: Ignoring
action. Coordinator job is not in
RUNNING/RUNNINGWITHERROR/PAUSED/PAUSEDWITHERROR state, but state=SUCCEEDED],
Error Code: E1100
08:38:09,628 INFO TestCoordActionInputCheckXCommandNonUTC:520 - USER[test]
GROUP[-] TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@2] Waiting up to [5,000] msec
08:38:09,645 DEBUG CoordActionInputCheckXCommand:526 - USER[test] GROUP[-]
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@3] Acquired lock for
[0000000-160304083808369-oozie-jenk-C] in [coord_action_input]
08:38:09,646 DEBUG CoordActionInputCheckXCommand:526 - USER[test] GROUP[-]
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@3] Released lock for
[0000000-160304083808369-oozie-jenk-C] in [coord_action_input]
08:38:09,646 WARN CoordActionInputCheckXCommand:523 - USER[test] GROUP[-]
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@3] E1100: Command precondition does
not hold before execution,
[[0000000-160304083808369-oozie-jenk-C@3]::CoordActionInputCheck:: Ignoring
action. Coordinator job is not in
RUNNING/RUNNINGWITHERROR/PAUSED/PAUSEDWITHERROR state, but state=SUCCEEDED],
Error Code: E1100
08:38:09,646 INFO TestCoordActionInputCheckXCommandNonUTC:520 - USER[test]
GROUP[-] TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@3] Waiting up to [5,000] msec
...
08:38:14,657 INFO TestCoordActionInputCheckXCommandNonUTC:520 - USER[test]
GROUP[-] TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@3] Waiting timed out after [5,000]
msec
08:38:14,662 INFO Services:520 - USER[test] GROUP[-] TOKEN[] APP[no-op-wf]
JOB[0000000-160304083808369-oozie-jenk-C]
ACTION[0000000-160304083808369-oozie-jenk-C@3]
...
This causes the action to remain in WAITING (as initially it is), as nothing
checks/deals with a SUCCESS coordinator's action. The reason also might be,
that as the precondition fails the coordinator is not processed any more.
Either way the action is not touched.
I see two options here:
1. Figure out which service puts the coordinator to SUCCESS, and prevent it
doing that. This might be tricky, but doable. Any ideas what that service is?
2. Separate the subtests to 4 distinct tests so they do not interfere with each
other. This would increase the test time by a few seconds, but would result in
a cleaner test.
Note:
This might be the case in the testLastOnly test as well, however there the last
few actions should be in WAITING state, so it never fails even if the
coordinator is not in a desired state. As the test never fails probably no
human checked it.
> TestCoordActionInputCheckXCommand.testNone is flakey
> ----------------------------------------------------
>
> Key: OOZIE-2459
> URL: https://issues.apache.org/jira/browse/OOZIE-2459
> Project: Oozie
> Issue Type: Bug
> Components: action, tests
> Affects Versions: 4.1.0
> Reporter: Ferenc Denes
> Priority: Minor
> Fix For: trunk
>
>
> TestCoordActionInputCheckXCommand.testNone is flakey.
> Also the TestCoordActionInputCheckXCommandNonUTC.testNone which is very
> similar.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)