wForget commented on code in PR #3076:
URL: https://github.com/apache/datafusion-comet/pull/3076#discussion_r2719279083


##########
common/src/main/scala/org/apache/comet/CometConf.scala:
##########
@@ -365,6 +365,33 @@ object CometConf extends ShimCometConf {
       .booleanConf
       .createWithDefault(true)
 
+  val COMET_EXEC_SHUFFLE_WITH_ROUND_ROBIN_PARTITIONING_ENABLED: 
ConfigEntry[Boolean] =
+    conf("spark.comet.native.shuffle.partitioning.roundrobin.enabled")
+      .category(CATEGORY_SHUFFLE)
+      .doc(
+        "Whether to enable round robin partitioning for Comet native shuffle. 
" +
+          "This is disabled by default because Comet's round-robin produces 
different " +
+          "partition assignments than Spark. Spark sorts rows by their binary 
UnsafeRow " +
+          "representation before assigning partitions, but Comet uses Arrow 
format which " +
+          "has a different binary layout. Instead, Comet implements 
round-robin as hash " +
+          "partitioning on all columns, which achieves the same goals: even 
distribution, " +
+          "deterministic output (for fault tolerance), and no semantic 
grouping. " +
+          "Sorted output will be identical to Spark, but unsorted row ordering 
may differ.")
+      .booleanConf
+      .createWithDefault(false)
+
+  val COMET_EXEC_SHUFFLE_WITH_ROUND_ROBIN_PARTITIONING_MAX_HASH_COLUMNS: 
ConfigEntry[Int] =
+    conf("spark.comet.native.shuffle.partitioning.roundrobin.maxHashColumns")
+      .category(CATEGORY_SHUFFLE)
+      .doc(
+        "The maximum number of columns to hash for round robin partitioning. " 
+
+          "When set to 0 (the default), all columns are hashed. " +
+          "When set to a positive value, only the first N columns are used for 
hashing, " +
+          "which can improve performance for wide tables while still providing 
" +
+          "reasonable distribution.")
+      .intConf
+      .createWithDefault(0)

Review Comment:
   add checkValue: 
   ```
   .checkValue(v => v >= 0, "The maximum number of columns to hash for round 
robin partitioning must be non-negative.")
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to