Hyukjin Kwon created SPARK-57650:
------------------------------------

             Summary: YarnClusterSuite tests intermittently time out due to the 
test mini-cluster's default AM resource limit
                 Key: SPARK-57650
                 URL: https://issues.apache.org/jira/browse/SPARK-57650
             Project: Spark
          Issue Type: Test
          Components: YARN, Tests
    Affects Versions: 4.2.0
            Reporter: Hyukjin Kwon


h3. Symptom
Several {{YarnClusterSuite}} tests fail intermittently on memory-constrained CI 
(observed on the scheduled Maven Scala 2.13 JDK 21 branch-4.2 build and the JDK 
17 branch-4.x build) with a 3-minute timeout:
{code}
The code passed to eventually never returned normally. Attempted 190 times over 
3.0 minutes. Last failure message: handle.getState().isFinal() was false. 
(BaseYarnClusterSuite.scala:213)
{code}
Affected tests: the two "ensuring redaction" tests, "yarn-cluster should 
respect conf overrides in SparkHadoopUtil (SPARK-16414, SPARK-23630)", and the 
SPARK-35672 'local' URI scheme jar tests.

h3. Root cause
The mini {{CapacityScheduler}} set up in {{BaseYarnClusterSuite}} configures 
the queue but never sets 
{{yarn.scheduler.capacity.maximum-am-resource-percent}}, so it defaults to 0.1. 
On a small CI cluster that caps the queue's total AM resource budget at ~1GB, 
smaller than the 1-2GB AM/driver memory these tests request, so the 
applications get stuck in the ACCEPTED state (never activated) and the suite 
times out. The YARN diagnostics show {{Queue's AM resource limit exceeded. AM 
Resource Request = <memory:2048>; Queue Resource Limit for AM = <memory:1024>}} 
repeated >1000 times.

h3. Fix
Set {{maximum-am-resource-percent}} to 1.0 (global and root.default) in 
{{BaseYarnClusterSuite}} so test AMs can use the whole queue and applications 
are always activated. Test-only change; de-flakes deterministically regardless 
of runner memory.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to