[ 
https://issues.apache.org/jira/browse/TEZ-2816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14742784#comment-14742784
 ] 

Bikas Saha commented on TEZ-2816:
---------------------------------

The core issue is a bug in the scheduler preemption code. Not sure yet, why it 
does not repro in master.
After a task needs preemption, we wait for 3 heartbeats for the RM to respond 
with new containers before preempting a different task for this task. The test 
for preemption checks for this.
However if eventually there is no preemption, then the counter that tracks the 
last preemption can remain with a stale count. Then, for the next preemption, 
because of the stale heartbeat count, the next preemption might be triggered 
immediately instead of waiting for 3 heartbeats.
Fixed the code to make sure the counter is always in sync with the heartbeat 
counter except during the time when there is a preemption candidate and we need 
to wait for 3 heartbeats. Fixed test case to verify this.
[~zjffdu] Please review.

> Build with hadoop 2.4 fails
> ---------------------------
>
>                 Key: TEZ-2816
>                 URL: https://issues.apache.org/jira/browse/TEZ-2816
>             Project: Apache Tez
>          Issue Type: Bug
>            Reporter: Jeff Zhang
>            Assignee: Bikas Saha
>         Attachments: TEZ-2816.1.patch
>
>
> https://builds.apache.org/job/Tez-Build-Hadoop-2.4/170/console
> {noformat}
> Running org.apache.tez.analyzer.TestAnalyzer
> Tests run: 13, Failures: 2, Errors: 0, Skipped: 0, Time elapsed: 99.595 sec 
> <<< FAILURE!
> testBasicInputFailureWithoutExit(org.apache.tez.analyzer.TestAnalyzer)  Time 
> elapsed: 6.276 sec  <<< FAILURE!
> java.lang.AssertionError: v2 : 000000_0
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.tez.analyzer.TestAnalyzer.verifyCriticalPath(TestAnalyzer.java:273)
>       at 
> org.apache.tez.analyzer.TestAnalyzer.runDAGAndVerify(TestAnalyzer.java:220)
>       at 
> org.apache.tez.analyzer.TestAnalyzer.testBasicInputFailureWithoutExit(TestAnalyzer.java:399)
> testCascadingInputFailureWithExitSuccess(org.apache.tez.analyzer.TestAnalyzer)
>   Time elapsed: 5.986 sec  <<< FAILURE!
> java.lang.AssertionError: v3 : 000000_1
>       at org.junit.Assert.fail(Assert.java:88)
>       at org.junit.Assert.assertTrue(Assert.java:41)
>       at 
> org.apache.tez.analyzer.TestAnalyzer.verifyCriticalPath(TestAnalyzer.java:273)
>       at 
> org.apache.tez.analyzer.TestAnalyzer.runDAGAndVerify(TestAnalyzer.java:220)
>       at 
> org.apache.tez.analyzer.TestAnalyzer.testCascadingInputFailureWithExitSuccess(TestAnalyzer.java:561)
> Results :
> Failed tests: 
>   
> TestAnalyzer.testBasicInputFailureWithoutExit:399->runDAGAndVerify:220->verifyCriticalPath:273
>  v2 : 000000_0
>   
> TestAnalyzer.testCascadingInputFailureWithExitSuccess:561->runDAGAndVerify:220->verifyCriticalPath:273
>  v3 : 000000_1
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to