Haibo Chen created MAPREDUCE-6675:
-------------------------------------
Summary: TestJobImpl.testUnusableNode failed
Key: MAPREDUCE-6675
URL: https://issues.apache.org/jira/browse/MAPREDUCE-6675
Project: Hadoop Map/Reduce
Issue Type: Bug
Components: mrv2
Affects Versions: 2.7.3
Reporter: Haibo Chen
Assignee: Haibo Chen
TestJobImpl#testUnusableNodeTransition is flaky.
2016-02-13 09:16:42 Running
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0, Time
elapsed: 8.324 sec <<< FAILURE! - in
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl
2016-02-13 09:16:50
testUnusableNodeTransition(org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl)
Time elapsed: 5.165 sec <<< FAILURE!
2016-02-13 09:16:50 java.lang.AssertionError: expected:<SUCCEEDED> but
was:<ERROR>
2016-02-13 09:16:50 at org.junit.Assert.fail(Assert.java:88)
2016-02-13 09:16:50 at org.junit.Assert.failNotEquals(Assert.java:743)
2016-02-13 09:16:50 at org.junit.Assert.assertEquals(Assert.java:118)
2016-02-13 09:16:50 at org.junit.Assert.assertEquals(Assert.java:144)
2016-02-13 09:16:50 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.assertJobState(TestJobImpl.java:977)
2016-02-13 09:16:50 at
org.apache.hadoop.mapreduce.v2.app.job.impl.TestJobImpl.testUnusableNodeTransition(TestJobImpl.java:627)
2016-02-13 09:16:50
2016-02-13 09:16:50
2016-02-13 09:16:50 Results :
2016-02-13 09:16:50
2016-02-13 09:16:50 Failed tests:
2016-02-13 09:16:50
TestJobImpl.testUnusableNodeTransition:627->assertJobState:977
expected:<SUCCEEDED> but was:<ERROR>
2016-02-13 09:16:50
2016-02-13 09:16:50 Tests run: 17, Failures: 1, Errors: 0, Skipped: 0.
Looking at the code, an JobUpdatedNodesEvent is handled by putting an
TaskAttemptKill event on the async dispatcher queue and return immediately, but
the event might not have been processed by the time all JobTaskEvents events
are seen by the job (the jobTaskSucceeded events are handed to Job immediately
without going through the dispatcher). Therefore, there is a slight chance that
the job will see all three succeeded attempts and transition to Committing
state before the taskAttemptKill event is handled by the dispatcher. Committing
jobs will reject later JobTaskEvents received and causing the failure.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)