baibaichen opened a new issue, #11922:
URL: https://github.com/apache/gluten/issues/11922
## Backend
VL (Velox)
**Gluten version**: main branch
## Description
Spark 4.1 introduced memory-based shuffle spill thresholds (SPARK-49386,
JIRA type: Improvement). The new `spillSizeThreshold` parameter enables
spilling by data size rather than only by row count. Gluten's shuffle
implementation does not support this threshold.
Spark 4.1 only.
**Parent issue**: #11910 (`[VL] Spark 4.x: Tracking new feature support`)
### Impact
| Suite | Exclude | spark40 | spark41 |
|-------|---------|:-------:|:-------:|
| GlutenDataFrameWindowFunctionsSuite | SPARK-49386 spill | 🟢 | 🔴 |
| GlutenJoinSuite | SPARK-49386 SortMergeJoin spill | 🟢 | 🔴 |
Note: `GlutenSQLWindowFunctionSuite` has a pre-existing spill issue ("low
buffer spill threshold") unrelated to SPARK-49386 — out of scope for this issue.
### References
- Apache Spark JIRA:
[SPARK-49386](https://issues.apache.org/jira/browse/SPARK-49386)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]