This is an automated email from the ASF dual-hosted git repository.
dongjoon pushed a commit to branch branch-3.5
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-3.5 by this push:
new d37a8b9c97b3 [SPARK-49983][CORE][TESTS] Fix
`BarrierTaskContextSuite.successively sync with allGather and barrier` test
case to be robust
d37a8b9c97b3 is described below
commit d37a8b9c97b33ede4c7c42ec8c16b7c152d42b86
Author: Dongjoon Hyun <[email protected]>
AuthorDate: Wed Oct 16 07:24:33 2024 -0700
[SPARK-49983][CORE][TESTS] Fix `BarrierTaskContextSuite.successively sync
with allGather and barrier` test case to be robust
### What changes were proposed in this pull request?
This PR aims to fix `BarrierTaskContextSuite.successively sync with
allGather and barrier` test case to be robust.
### Why are the changes needed?
The test case asserts the duration of partitions. However, this is flaky
because we don't know when a partition is triggered before `barrier` sync.
https://github.com/apache/spark/blob/0e75d19a736aa18fe77414991ebb7e3577a43af8/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala#L116-L118
Although we added `TestUtils.waitUntilExecutorsUp` at Apache Spark 3.0.0
like the following,
- #28658
let's say a partition starts slowly than `38ms` and all partitions sleep
`1s` exactly. Then, the test case fails like the following.
- https://github.com/apache/spark/actions/runs/11298639789/job/31428018075
```
BarrierTaskContextSuite:
...
- successively sync with allGather and barrier *** FAILED ***
1038 was not less than or equal to 1000
(BarrierTaskContextSuite.scala:118)
```
According to the failure history here (SPARK-49983) and SPARK-31730, the
slowness seems to be less than `200ms` when it happens. So, this PR aims to
reduce the flakiness by capping the sleep up to 500ms while keeping the `1s`
validation. There is no test coverage change because this test case focuses on
the `successively sync with allGather and battier`.
### Does this PR introduce _any_ user-facing change?
No, this is a test-only test case.
### How was this patch tested?
Pass the CIs.
### Was this patch authored or co-authored using generative AI tooling?
No.
Closes #48487 from dongjoon-hyun/SPARK-49983.
Authored-by: Dongjoon Hyun <[email protected]>
Signed-off-by: Dongjoon Hyun <[email protected]>
(cherry picked from commit bcfe62b9988f9b00c23de0b71acc1c6170edee9e)
Signed-off-by: Dongjoon Hyun <[email protected]>
---
.../test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
index 26cd5374fa09..f370a8e02391 100644
---
a/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
+++
b/core/src/test/scala/org/apache/spark/scheduler/BarrierTaskContextSuite.scala
@@ -101,7 +101,7 @@ class BarrierTaskContextSuite extends SparkFunSuite with
LocalSparkContext with
val rdd2 = rdd.barrier().mapPartitions { it =>
val context = BarrierTaskContext.get()
// Sleep for a random time before global sync.
- Thread.sleep(Random.nextInt(1000))
+ Thread.sleep(Random.nextInt(500))
context.barrier()
val time1 = System.currentTimeMillis()
// Sleep for a random time before global sync.
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]