andygrove opened a new pull request, #4360:
URL: https://github.com/apache/datafusion-comet/pull/4360
## Which issue does this PR close?
Closes #4327.
## Rationale for this change
The `spark-sql-auto-sql_core-1/spark-4.0.2-jdk21` job has been failing
frequently with:
```
[info] ParquetFileFormatV{1,2}Suite (or OrcSourceV{1,2}Suite) *** ABORTED ***
[info] \"There are 1 possibly leaked file streams.\"
(SharedSparkSession.scala:189)
```
This is Spark's `DebugFilesystem.assertNoOpenStreams()` in
`SharedSparkSession.afterEach`, retried via `eventually` for ~10s.
These four suites are flaky on JDK 21 even in apache/spark's own CI; the
upstream workaround is `DEDICATED_JVM_SBT_TESTS`, which asks SBT to fork a
dedicated JVM per listed suite. Comet's workflow already sets that env var for
the Spark 4.0 row, but it has no effect because the workflow also
unconditionally sets `SERIAL_SBT_TESTS=1` (added in #4285 to cap peak memory on
standard runners). In `project/SparkBuild.scala`:
```scala
if (!sys.env.contains(\"SERIAL_SBT_TESTS\")) {
allProjects.foreach(enable(SparkParallelTestGrouping.settings))
}
```
`SparkParallelTestGrouping.settings` is the only consumer of
`DEDICATED_JVM_SBT_TESTS`, so when `SERIAL_SBT_TESTS` is set the env var is
read into an unused set and every suite runs in one shared forked JVM.
Cross-suite state accumulation from ~11k prior tests is what trips the
file-stream leak detection.
Evidence from a failing log ([run
26004020697](https://github.com/apache/datafusion-comet/actions/runs/26004020697/job/76441382377)):
- thread-leak warning mentions
`readingParquetFooters-ForkJoinPool-12260-worker-1` (high pool counter = many
prior suites in this JVM)
- unrelated suites (`AnalysisConfOverrideSuite`,
`TPCDSModifiedPlanStabilityWithStatsSuite`, state-store suites) appear in the
same JVM right before these Parquet/ORC suites
- no SBT fork-restart markers between unrelated suites
## What changes are included in this PR?
- Drop `SERIAL_SBT_TESTS=1` only for the Spark 4.0.2/JDK 21 matrix row, so
Spark's `SparkParallelTestGrouping` is installed and `DEDICATED_JVM_SBT_TESTS`
actually forks a separate JVM per listed suite. Other rows keep
`SERIAL_SBT_TESTS=1` so their memory profile is unchanged.
- Override `Global / concurrentRestrictions` to cap parallel forked test
JVMs at 1. Spark's defaults would otherwise allow up to 4 in parallel (each
`-Xmx2g`), which would exceed the 7 GB runner budget. The cap is a no-op for
rows that still have `SERIAL_SBT_TESTS=1`.
## How are these changes tested?
CI on this PR will exercise the new path. A successful run for
`spark-sql-auto-sql_core-1/spark-4.0.2-jdk21` confirms that the dedicated-JVM
workaround is now in effect and the four problem suites no longer abort.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]