This is an automated email from the ASF dual-hosted git repository.

wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/master by this push:
     new 4badef3  [SPARK-32000][CORE][TESTS] Fix the flaky test for partially 
launched task in barrier-mode
4badef3 is described below

commit 4badef38a52849b4af0b211523de6b09f73397f1
Author: yi.wu <[email protected]>
AuthorDate: Wed Jun 17 13:28:47 2020 +0000

    [SPARK-32000][CORE][TESTS] Fix the flaky test for partially launched task 
in barrier-mode
    
    ### What changes were proposed in this pull request?
    
    This PR changes the test to get an active executorId and set it as 
preferred location instead of setting a fixed preferred location.
    
    ### Why are the changes needed?
    
    The test is flaky. After checking the 
[log](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124086/artifact/core/),
 I find the root cause is:
    
    Two test cases from different test suites got submitted at the same time 
because of concurrent execution. In this particular case, the two test cases 
(from DistributedSuite and BarrierTaskContextSuite) both launch under 
local-cluster mode. The two applications are submitted at the SAME time so they 
have the same applications(app-20200615210132-0000). Thus, when the cluster of 
BarrierTaskContextSuite is launching executors, it failed to create the 
directory for the executor 0, because  [...]
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    The test can not be reproduced locally. We can only know it's been fixed 
when it's no longer flaky on Jenkins.
    
    Closes #28849 from Ngone51/fix-spark-32000.
    
    Authored-by: yi.wu <[email protected]>
    Signed-off-by: Wenchen Fan <[email protected]>
---
 .../scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala   | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 01c82f8..d18ca36 100644
--- 
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++ 
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -276,11 +276,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with 
LocalSparkContext with
 
   test("SPARK-31485: barrier stage should fail if only partial tasks are 
launched") {
     initLocalClusterSparkContext(2)
+    val id = sc.getExecutorIds().head
     val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
     val dep = new OneToOneDependency[Int](rdd0)
-    // set up a barrier stage with 2 tasks and both tasks prefer executor 0 
(only 1 core) for
+    // set up a barrier stage with 2 tasks and both tasks prefer the same 
executor (only 1 core) for
     // scheduling. So, one of tasks won't be scheduled in one round of 
resource offer.
-    val rdd = new MyRDD(sc, 2, List(dep), Seq(Seq("executor_h_0"), 
Seq("executor_h_0")))
+    val rdd = new MyRDD(sc, 2, List(dep), Seq(Seq(s"executor_h_$id"), 
Seq(s"executor_h_$id")))
     val errorMsg = intercept[SparkException] {
       rdd.barrier().mapPartitions { iter =>
         BarrierTaskContext.get().barrier()


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to