Repository: spark
Updated Branches:
  refs/heads/master cccd64393 -> ac013ea58


[SPARK-18846][SCHEDULER] Fix flakiness in SchedulerIntegrationSuite

There is a small race in SchedulerIntegrationSuite.
The test assumes that the taskscheduler thread
processing that last task will finish before the DAGScheduler processes
the task event and notifies the job waiter, but that is not 100%
guaranteed.

ran the test locally a bunch of times, never failed, though admittedly
it never failed locally for me before either.  However I am nearly 100%
certain this is what caused the failure of one jenkins build
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/68694/consoleFull
(which is long gone now, sorry -- I fixed it as part of
https://github.com/apache/spark/pull/14079 initially)

Author: Imran Rashid <[email protected]>

Closes #16270 from squito/sched_integ_flakiness.


Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ac013ea5
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/ac013ea5
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/ac013ea5

Branch: refs/heads/master
Commit: ac013ea58933a2057475f7accd197b9ed01b495e
Parents: cccd643
Author: Imran Rashid <[email protected]>
Authored: Wed Dec 14 12:26:49 2016 -0600
Committer: Imran Rashid <[email protected]>
Committed: Wed Dec 14 12:27:01 2016 -0600

----------------------------------------------------------------------
 .../spark/scheduler/SchedulerIntegrationSuite.scala   | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)
----------------------------------------------------------------------


http://git-wip-us.apache.org/repos/asf/spark/blob/ac013ea5/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala
----------------------------------------------------------------------
diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala
 
b/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala
index c28aa06..2ba63da 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/SchedulerIntegrationSuite.scala
@@ -28,6 +28,8 @@ import scala.reflect.ClassTag
 
 import org.scalactic.TripleEquals
 import org.scalatest.Assertions.AssertionsHelper
+import org.scalatest.concurrent.Eventually._
+import org.scalatest.time.SpanSugar._
 
 import org.apache.spark._
 import org.apache.spark.TaskState._
@@ -157,8 +159,16 @@ abstract class SchedulerIntegrationSuite[T <: MockBackend: 
ClassTag] extends Spa
       }
       // When a job fails, we terminate before waiting for all the task end 
events to come in,
       // so there might still be a running task set.  So we only check these 
conditions
-      // when the job succeeds
-      assert(taskScheduler.runningTaskSets.isEmpty)
+      // when the job succeeds.
+      // When the final task of a taskset completes, we post
+      // the event to the DAGScheduler event loop before we finish processing 
in the taskscheduler
+      // thread.  It's possible the DAGScheduler thread processes the event, 
finishes the job,
+      // and notifies the job waiter before our original thread in the task 
scheduler finishes
+      // handling the event and marks the taskset as complete.  So its ok if 
we need to wait a
+      // *little* bit longer for the original taskscheduler thread to finish 
up to deal w/ the race.
+      eventually(timeout(1 second), interval(10 millis)) {
+        assert(taskScheduler.runningTaskSets.isEmpty)
+      }
       assert(!backend.hasTasks)
     } else {
       assert(failure != null)


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to