Hyukjin Kwon created SPARK-57650:
------------------------------------
Summary: YarnClusterSuite tests intermittently time out due to the
test mini-cluster's default AM resource limit
Key: SPARK-57650
URL: https://issues.apache.org/jira/browse/SPARK-57650
Project: Spark
Issue Type: Test
Components: YARN, Tests
Affects Versions: 4.2.0
Reporter: Hyukjin Kwon
h3. Symptom
Several {{YarnClusterSuite}} tests fail intermittently on memory-constrained CI
(observed on the scheduled Maven Scala 2.13 JDK 21 branch-4.2 build and the JDK
17 branch-4.x build) with a 3-minute timeout:
{code}
The code passed to eventually never returned normally. Attempted 190 times over
3.0 minutes. Last failure message: handle.getState().isFinal() was false.
(BaseYarnClusterSuite.scala:213)
{code}
Affected tests: the two "ensuring redaction" tests, "yarn-cluster should
respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630)", and the
SPARK-35672 'local' URI scheme jar tests.
h3. Root cause
The mini {{CapacityScheduler}} set up in {{BaseYarnClusterSuite}} configures
the queue but never sets
{{yarn.scheduler.capacity.maximum-am-resource-percent}}, so it defaults to 0.1.
On a small CI cluster that caps the queue's total AM resource budget at ~1GB,
smaller than the 1-2GB AM/driver memory these tests request, so the
applications get stuck in the ACCEPTED state (never activated) and the suite
times out. The YARN diagnostics show {{Queue's AM resource limit exceeded. AM
Resource Request = <memory:2048>; Queue Resource Limit for AM = <memory:1024>}}
repeated >1000 times.
h3. Fix
Set {{maximum-am-resource-percent}} to 1.0 (global and root.default) in
{{BaseYarnClusterSuite}} so test AMs can use the whole queue and applications
are always activated. Test-only change; de-flakes deterministically regardless
of runner memory.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]