[ 
https://issues.apache.org/jira/browse/OOZIE-2459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15189640#comment-15189640
 ] 

Ferenc Denes commented on OOZIE-2459:
-------------------------------------

As I can see this test has 4 subtests in there.
At some failed occasions' log shows, that the Coordinator action goes to 
SUCCESS somewhere between or near the 1st and the second subtest. From that on 
the precondition check fails:

...
08:38:08,753  INFO CoordActionNotificationXCommand:520 - USER[-] GROUP[-] 
TOKEN[-] APP[-] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@1] No Notification URL is defined. 
Therefore nothing to notify for job 0000000-160304083808369-oozie-jenk-C@1
08:38:09,626 DEBUG CoordActionInputCheckXCommand:526 - USER[test] GROUP[-] 
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@2] Acquired lock for 
[0000000-160304083808369-oozie-jenk-C] in [coord_action_input]
08:38:09,628 DEBUG CoordActionInputCheckXCommand:526 - USER[test] GROUP[-] 
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@2] Released lock for 
[0000000-160304083808369-oozie-jenk-C] in [coord_action_input]
08:38:09,628  WARN CoordActionInputCheckXCommand:523 - USER[test] GROUP[-] 
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@2] E1100: Command precondition does 
not hold before execution, 
[[0000000-160304083808369-oozie-jenk-C@2]::CoordActionInputCheck:: Ignoring 
action. Coordinator job is not in 
RUNNING/RUNNINGWITHERROR/PAUSED/PAUSEDWITHERROR state, but state=SUCCEEDED], 
Error Code: E1100
08:38:09,628  INFO TestCoordActionInputCheckXCommandNonUTC:520 - USER[test] 
GROUP[-] TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@2] Waiting up to [5,000] msec
08:38:09,645 DEBUG CoordActionInputCheckXCommand:526 - USER[test] GROUP[-] 
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@3] Acquired lock for 
[0000000-160304083808369-oozie-jenk-C] in [coord_action_input]
08:38:09,646 DEBUG CoordActionInputCheckXCommand:526 - USER[test] GROUP[-] 
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@3] Released lock for 
[0000000-160304083808369-oozie-jenk-C] in [coord_action_input]
08:38:09,646  WARN CoordActionInputCheckXCommand:523 - USER[test] GROUP[-] 
TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@3] E1100: Command precondition does 
not hold before execution, 
[[0000000-160304083808369-oozie-jenk-C@3]::CoordActionInputCheck:: Ignoring 
action. Coordinator job is not in 
RUNNING/RUNNINGWITHERROR/PAUSED/PAUSEDWITHERROR state, but state=SUCCEEDED], 
Error Code: E1100
08:38:09,646  INFO TestCoordActionInputCheckXCommandNonUTC:520 - USER[test] 
GROUP[-] TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@3] Waiting up to [5,000] msec
...
08:38:14,657  INFO TestCoordActionInputCheckXCommandNonUTC:520 - USER[test] 
GROUP[-] TOKEN[] APP[no-op-wf] JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@3] Waiting timed out after [5,000] 
msec
08:38:14,662  INFO Services:520 - USER[test] GROUP[-] TOKEN[] APP[no-op-wf] 
JOB[0000000-160304083808369-oozie-jenk-C] 
ACTION[0000000-160304083808369-oozie-jenk-C@3]
...

This causes the action to remain in WAITING (as initially it is), as nothing 
checks/deals with a SUCCESS coordinator's action. The reason also might be, 
that as the precondition fails the coordinator is not processed any more. 
Either way the action is not touched.

I see two options here:
1. Figure out which service puts the coordinator to SUCCESS, and prevent it 
doing that. This might be tricky, but doable. Any ideas what that service is?
2. Separate the subtests to 4 distinct tests so they do not interfere with each 
other. This would increase the test time by a few seconds, but would result in 
a cleaner test.


Note:
This might be the case in the testLastOnly test as well, however there the last 
few actions should be in WAITING state, so it never fails even if the 
coordinator is not in a desired state. As the test never fails probably no 
human checked it.



> TestCoordActionInputCheckXCommand.testNone is flakey
> ----------------------------------------------------
>
>                 Key: OOZIE-2459
>                 URL: https://issues.apache.org/jira/browse/OOZIE-2459
>             Project: Oozie
>          Issue Type: Bug
>          Components: action, tests
>    Affects Versions: 4.1.0
>            Reporter: Ferenc Denes
>            Priority: Minor
>             Fix For: trunk
>
>
> TestCoordActionInputCheckXCommand.testNone is flakey.
> Also the TestCoordActionInputCheckXCommandNonUTC.testNone which is very 
> similar.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to