This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 4badef3 [SPARK-32000][CORE][TESTS] Fix the flaky test for partially
launched task in barrier-mode
4badef3 is described below
commit 4badef38a52849b4af0b211523de6b09f73397f1
Author: yi.wu <[email protected]>
AuthorDate: Wed Jun 17 13:28:47 2020 +0000
[SPARK-32000][CORE][TESTS] Fix the flaky test for partially launched task
in barrier-mode
### What changes were proposed in this pull request?
This PR changes the test to get an active executorId and set it as
preferred location instead of setting a fixed preferred location.
### Why are the changes needed?
The test is flaky. After checking the
[log](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/124086/artifact/core/),
I find the root cause is:
Two test cases from different test suites got submitted at the same time
because of concurrent execution. In this particular case, the two test cases
(from DistributedSuite and BarrierTaskContextSuite) both launch under
local-cluster mode. The two applications are submitted at the SAME time so they
have the same applications(app-20200615210132-0000). Thus, when the cluster of
BarrierTaskContextSuite is launching executors, it failed to create the
directory for the executor 0, because [...]
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
The test can not be reproduced locally. We can only know it's been fixed
when it's no longer flaky on Jenkins.
Closes #28849 from Ngone51/fix-spark-32000.
Authored-by: yi.wu <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
.../scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 01c82f8..d18ca36 100644
---
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -276,11 +276,12 @@ class BarrierTaskContextSuite extends SparkFunSuite with
LocalSparkContext with
test("SPARK-31485: barrier stage should fail if only partial tasks are
launched") {
initLocalClusterSparkContext(2)
+ val id = sc.getExecutorIds().head
val rdd0 = sc.parallelize(Seq(0, 1, 2, 3), 2)
val dep = new OneToOneDependency[Int](rdd0)
- // set up a barrier stage with 2 tasks and both tasks prefer executor 0
(only 1 core) for
+ // set up a barrier stage with 2 tasks and both tasks prefer the same
executor (only 1 core) for
// scheduling. So, one of tasks won't be scheduled in one round of
resource offer.
- val rdd = new MyRDD(sc, 2, List(dep), Seq(Seq("executor_h_0"),
Seq("executor_h_0")))
+ val rdd = new MyRDD(sc, 2, List(dep), Seq(Seq(s"executor_h_$id"),
Seq(s"executor_h_$id")))
val errorMsg = intercept[SparkException] {
rdd.barrier().mapPartitions { iter =>
BarrierTaskContext.get().barrier()
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]