Repository: spark Updated Branches: refs/heads/master 9bd80ad6b -> cafc696d0
[HOTFIX][CORE] fix flaky BasicSchedulerIntegrationTest ## What changes were proposed in this pull request? SPARK-15927 exacerbated a race in BasicSchedulerIntegrationTest, so it went from very unlikely to fairly frequent. The issue is that stage numbering is not completely deterministic, but these tests treated it like it was. So turn off the tests. ## How was this patch tested? on my laptop the test failed abotu 10% of the time before this change, and didn't fail in 500 runs after the change. Author: Imran Rashid <[email protected]> Closes #13688 from squito/hotfix_basic_scheduler. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cafc696d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cafc696d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/cafc696d Branch: refs/heads/master Commit: cafc696d095ae06dd64805574d55a19637743aa6 Parents: 9bd80ad Author: Imran Rashid <[email protected]> Authored: Wed Jun 15 16:44:18 2016 -0500 Committer: Imran Rashid <[email protected]> Committed: Wed Jun 15 16:44:18 2016 -0500 ---------------------------------------------------------------------- .../spark/scheduler/SchedulerIntegrationSuite.scala | 15 +++++++-------- 1 file changed, 7 insertions(+), 8 deletions(-) ---------------------------------------------------------------------- http://git-wip-us.apache.org/repos/asf/spark/blob/cafc696d/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala ---------------------------------------------------------------------- diff --git a/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala b/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala index 54b7312..12dfa56 100644 --- a/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala +++ b/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala @@ -518,10 +518,11 @@ class BasicSchedulerIntegrationSuite extends SchedulerIntegrationSuite[SingleCor // make sure the required map output is available task.stageId match { - case 1 => assertMapOutputAvailable(b) - case 3 => assertMapOutputAvailable(c) case 4 => assertMapOutputAvailable(d) - case _ => // no shuffle map input, nothing to check + case _ => + // we can't check for the output for the two intermediate stages, unfortunately, + // b/c the stage numbering is non-deterministic, so stage number alone doesn't tell + // us what to check } (task.stageId, task.stageAttemptId, task.partitionId) match { @@ -557,11 +558,9 @@ class BasicSchedulerIntegrationSuite extends SchedulerIntegrationSuite[SingleCor val (taskDescription, task) = backend.beginTask() stageToAttempts.getOrElseUpdate(task.stageId, new HashSet()) += task.stageAttemptId - // make sure the required map output is available - task.stageId match { - case 1 => assertMapOutputAvailable(shuffledRdd) - case _ => // no shuffle map input, nothing to check - } + // We cannot check if shuffle output is available, because the failed fetch will clear the + // shuffle output. Then we'd have a race, between the already-started task from the first + // attempt, and when the failure clears out the map output status. (task.stageId, task.stageAttemptId, task.partitionId) match { case (0, _, _) => --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
