[ https://issues.apache.org/jira/browse/MAPREDUCE-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15675040#comment-15675040 ]
Varun Saxena edited comment on MAPREDUCE-6801 at 11/17/16 10:49 PM: -------------------------------------------------------------------- Thanks [~haibochen] for the patch. This should handle all the cases except one, although that would happen rarely. If internal state at which job is stuck is SETUP (due to slow processing), tasks wont be scheduled. Hence, task wont reach kill state for which we have an assertion for. Internal state of SETUP means an external state of RUNNING. Therefore {{app.waitForState(job, JobState.RUNNING)}} should be replaced by {{app.waitForInternalState((JobImpl) job, JobStateInternal.RUNNING)}} I was able to simulate this case by putting a sleep in dispatcher. was (Author: varun_saxena): Thanks [~haibochen] for the patch. This should handle all the cases except one, although rarely. If internal state at which job is stuck is SETUP (due to slow processing), tasks wont be scheduled. Hence, task wont reach kill state for which we have an assertion for. Internal state of SETUP means an external state of RUNNING. Therefore {{app.waitForState(job, JobState.RUNNING)}} should be replaced by {{app.waitForInternalState((JobImpl) job, JobStateInternal.RUNNING)}} I was able to simulate this case by putting a sleep in dispatcher. > Fix flaky TestKill.testKillJob() > -------------------------------- > > Key: MAPREDUCE-6801 > URL: https://issues.apache.org/jira/browse/MAPREDUCE-6801 > Project: Hadoop Map/Reduce > Issue Type: Bug > Components: mrv2 > Affects Versions: 3.0.0-alpha1 > Reporter: Haibo Chen > Assignee: Haibo Chen > Attachments: mapreduce6801.001.patch > > > TestKill.testKillJob often fails for the same reason with the following error > message: > {code} > 1 tests failed. > FAILED: org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob > Error Message: > Task state not correct expected:<KILLED> but was:<NEW/SCHEDULED/RUNNING> > Stack Trace: > java.lang.AssertionError: Task state not correct expected:<KILLED> but > was:<NEW/SCHEDULED/RUNNING> > at org.junit.Assert.fail(Assert.java:88) > at org.junit.Assert.failNotEquals(Assert.java:743) > at org.junit.Assert.assertEquals(Assert.java:118) > at > org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob(TestKill.java:84) > {code} > The root cause is that when the job is in KILLED state from an external view, > TaskKillEvents and TaskAttemptKillEvents placed on the event loop queue may > not have been processed by the dispatcher thread. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org