This is an automated email from the ASF dual-hosted git repository.
wenchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/master by this push:
new 2afc713698bc [SPARK-54830][CORE] Enable checksum based indeterminate
shuffle retry by default
2afc713698bc is described below
commit 2afc713698bc480c5fa1a8aa09e4812787af4f41
Author: Tengfei Huang <[email protected]>
AuthorDate: Mon Dec 29 20:55:26 2025 +0800
[SPARK-54830][CORE] Enable checksum based indeterminate shuffle retry by
default
### What changes were proposed in this pull request?
Enable checksum based indeterminate shuffle retry by default.
Increase jvm memory size to 6g for `sql` module tests, as test case
[SPARK-48037: Fix SortShuffleWriter lacks shuffle write related metrics
resulting in potentially inaccurate
data](https://github.com/apache/spark/blob/316322cbcb55ff5c1b4e479bc2aae12babdae534/sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala#L2696)
set shuffle partition as `16777216` which will need more memory for computing
order independent shuffle checksum.
### Why are the changes needed?
As checksum based solution is more accurate to detect indeterminate shuffle
output changes, propose to enable it by default to avoid query correctness
issues caused by indeterminate shuffle retry.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
Existing UTs.
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #53574 from ivoson/SPARK-54556-followup.
Authored-by: Tengfei Huang <[email protected]>
Signed-off-by: Wenchen Fan <[email protected]>
---
project/SparkBuild.scala | 3 +++
.../src/main/scala/org/apache/spark/sql/internal/SQLConf.scala | 4 ++--
2 files changed, 5 insertions(+), 2 deletions(-)
diff --git a/project/SparkBuild.scala b/project/SparkBuild.scala
index 924b4df98a56..83b5ee84478a 100644
--- a/project/SparkBuild.scala
+++ b/project/SparkBuild.scala
@@ -1322,6 +1322,9 @@ object SqlApi {
object SQL {
import BuildCommons.protoVersion
lazy val settings = Seq(
+ // SPARK-54830: avoid AdaptiveQueryExecSuite OOM, since computing order
independent shuffle checksum needs more
+ // memory for test case introduced by SPARK-48037 which set shuffle
partition to 16777216
+ (Test / javaOptions) += "-Xmx6g",
// Setting version for the protobuf compiler. This has to be propagated to
every sub-project
// even if the project is not using it.
PB.protocVersion := BuildCommons.protoVersion,
diff --git
a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
index 95f5a3a4fabf..a8178ce8ce2b 100644
--- a/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
+++ b/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
@@ -907,7 +907,7 @@ object SQLConf {
"retry all tasks of the consumer stages to avoid correctness issues.")
.version("4.1.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
private[spark] val SHUFFLE_CHECKSUM_MISMATCH_FULL_RETRY_ENABLED =
buildConf("spark.sql.shuffle.orderIndependentChecksum.enableFullRetryOnMismatch")
@@ -915,7 +915,7 @@ object SQLConf {
"with its producer stages.")
.version("4.1.0")
.booleanConf
- .createWithDefault(false)
+ .createWithDefault(true)
val SHUFFLE_TARGET_POSTSHUFFLE_INPUT_SIZE =
buildConf("spark.sql.adaptive.shuffle.targetPostShuffleInputSize")
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]