Repository: spark
Updated Branches:
refs/heads/branch-2.4 5cc2987db -> 3d2fce5a3
[SPARK-25899][TESTS] Fix flaky CoarseGrainedSchedulerBackendSuite
## What changes were proposed in this pull request?
I saw CoarseGrainedSchedulerBackendSuite failed in my PR and finally reproduced
the following error on a very busy machine:
```
sbt.ForkMain$ForkError:
org.scalatest.exceptions.TestFailedDueToTimeoutException: The code passed to
eventually never returned normally. Attempted 400 times over 10.009828643999999
seconds. Last failure message: ArrayBuffer("2", "0", "3") had length 3 instead
of expected length 4.
```
The logs in this test shows executor 1 was not up when the test failed.
```
18/10/30 11:34:03.563 dispatcher-event-loop-12 INFO
CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.2:43656) with ID 2
18/10/30 11:34:03.593 dispatcher-event-loop-3 INFO
CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.2:43658) with ID 3
18/10/30 11:34:03.629 dispatcher-event-loop-6 INFO
CoarseGrainedSchedulerBackend$DriverEndpoint: Registered executor
NettyRpcEndpointRef(spark-client://Executor) (172.17.0.2:43654) with ID 0
18/10/30 11:34:03.885
pool-1-thread-1-ScalaTest-running-CoarseGrainedSchedulerBackendSuite INFO
CoarseGrainedSchedulerBackendSuite:
===== FINISHED o.a.s.scheduler.CoarseGrainedSchedulerBackendSuite: 'compute max
number of concurrent tasks can be launched' =====
```
And the following logs in executor 1 shows it was still doing the
initialization when the timeout happened (at 18/10/30 11:34:03.885).
```
18/10/30 11:34:03.463 netty-rpc-connection-0 INFO TransportClientFactory:
Successfully created connection to 54b6b6217301/172.17.0.2:33741 after 37 ms (0
ms spent in bootstraps)
18/10/30 11:34:03.959 main INFO DiskBlockManager: Created local directory at
/home/jenkins/workspace/core/target/tmp/spark-383518bc-53bd-4d9c-885b-d881f03875bf/executor-61c406e4-178f-40a6-ac2c-7314ee6fb142/blockmgr-03fb84a1-eedc-4055-8743-682eb3ac5c67
18/10/30 11:34:03.993 main INFO MemoryStore: MemoryStore started with capacity
546.3 MB
```
Hence, I think our current 10 seconds is not enough on a slow Jenkins machine.
This PR just increases the timeout from 10 seconds to 60 seconds to make the
test more stable.
## How was this patch tested?
Jenkins
Closes #22910 from zsxwing/fix-flaky-test.
Authored-by: Shixiong Zhu <[email protected]>
Signed-off-by: gatorsmile <[email protected]>
(cherry picked from commit 6be3cce751fd0abf00d668c771f56093f2fa6817)
Signed-off-by: gatorsmile <[email protected]>
Project: http://git-wip-us.apache.org/repos/asf/spark/repo
Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/3d2fce5a
Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/3d2fce5a
Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/3d2fce5a
Branch: refs/heads/branch-2.4
Commit: 3d2fce5a3275a8f5d50a6894198297cddc022843
Parents: 5cc2987
Author: Shixiong Zhu <[email protected]>
Authored: Wed Oct 31 15:14:10 2018 -0700
Committer: gatorsmile <[email protected]>
Committed: Wed Oct 31 15:14:24 2018 -0700
----------------------------------------------------------------------
.../spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
----------------------------------------------------------------------
http://git-wip-us.apache.org/repos/asf/spark/blob/3d2fce5a/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
----------------------------------------------------------------------
diff --git
a/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
b/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
index 80c9c6f..c5a3966 100644
---
a/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
+++
b/core/src/test/scala/org/apache/spark/scheduler/CoarseGrainedSchedulerBackendSuite.scala
@@ -30,6 +30,8 @@ import org.apache.spark.util.{RpcUtils, SerializableBuffer}
class CoarseGrainedSchedulerBackendSuite extends SparkFunSuite with
LocalSparkContext
with Eventually {
+ private val executorUpTimeout = 60.seconds
+
test("serialized task larger than max RPC message size") {
val conf = new SparkConf
conf.set("spark.rpc.message.maxSize", "1")
@@ -51,7 +53,7 @@ class CoarseGrainedSchedulerBackendSuite extends
SparkFunSuite with LocalSparkCo
.setMaster("local-cluster[4, 3, 1024]")
.setAppName("test")
sc = new SparkContext(conf)
- eventually(timeout(10.seconds)) {
+ eventually(timeout(executorUpTimeout)) {
// Ensure all executors have been launched.
assert(sc.getExecutorIds().length == 4)
}
@@ -64,7 +66,7 @@ class CoarseGrainedSchedulerBackendSuite extends
SparkFunSuite with LocalSparkCo
.setMaster("local-cluster[4, 3, 1024]")
.setAppName("test")
sc = new SparkContext(conf)
- eventually(timeout(10.seconds)) {
+ eventually(timeout(executorUpTimeout)) {
// Ensure all executors have been launched.
assert(sc.getExecutorIds().length == 4)
}
@@ -96,7 +98,7 @@ class CoarseGrainedSchedulerBackendSuite extends
SparkFunSuite with LocalSparkCo
try {
sc.addSparkListener(listener)
- eventually(timeout(10.seconds)) {
+ eventually(timeout(executorUpTimeout)) {
// Ensure all executors have been launched.
assert(sc.getExecutorIds().length == 4)
}
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]