[ 
https://issues.apache.org/jira/browse/MAPREDUCE-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15675040#comment-15675040
 ] 

Varun Saxena edited comment on MAPREDUCE-6801 at 11/17/16 10:49 PM:
--------------------------------------------------------------------

Thanks [~haibochen] for the patch. This should handle all the cases except one, 
although that would happen rarely. If internal state at which job is stuck is 
SETUP (due to slow processing), tasks wont be scheduled. Hence, task wont reach 
kill state for which we have an assertion for. Internal state of SETUP means an 
external state of RUNNING. Therefore {{app.waitForState(job, 
JobState.RUNNING)}} should be replaced by {{app.waitForInternalState((JobImpl) 
job, JobStateInternal.RUNNING)}} 

I was able to simulate this case by putting a sleep in dispatcher.


was (Author: varun_saxena):
Thanks [~haibochen] for the patch. This should handle all the cases except one, 
although rarely. If internal state at which job is stuck is SETUP (due to slow 
processing), tasks wont be scheduled. Hence, task wont reach kill state for 
which we have an assertion for. Internal state of SETUP means an external state 
of RUNNING. Therefore {{app.waitForState(job, JobState.RUNNING)}} should be 
replaced by {{app.waitForInternalState((JobImpl) job, 
JobStateInternal.RUNNING)}} 

I was able to simulate this case by putting a sleep in dispatcher.

> Fix flaky TestKill.testKillJob()
> --------------------------------
>
>                 Key: MAPREDUCE-6801
>                 URL: https://issues.apache.org/jira/browse/MAPREDUCE-6801
>             Project: Hadoop Map/Reduce
>          Issue Type: Bug
>          Components: mrv2
>    Affects Versions: 3.0.0-alpha1
>            Reporter: Haibo Chen
>            Assignee: Haibo Chen
>         Attachments: mapreduce6801.001.patch
>
>
> TestKill.testKillJob often fails for the same reason with the following error 
> message:
> {code}
> 1 tests failed.
> FAILED:  org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob
> Error Message:
> Task state not correct expected:<KILLED> but was:<NEW/SCHEDULED/RUNNING>
> Stack Trace:
> java.lang.AssertionError: Task state not correct expected:<KILLED> but 
> was:<NEW/SCHEDULED/RUNNING>
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.failNotEquals(Assert.java:743)
>       at org.junit.Assert.assertEquals(Assert.java:118)
>       at 
> org.apache.hadoop.mapreduce.v2.app.TestKill.testKillJob(TestKill.java:84)
> {code}
> The root cause is that when the job is in KILLED state from an external view, 
> TaskKillEvents and TaskAttemptKillEvents placed on the event loop queue may 
> not have been processed by the dispatcher thread.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: mapreduce-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: mapreduce-issues-h...@hadoop.apache.org

Reply via email to