comphead commented on code in PR #3076:
URL: https://github.com/apache/datafusion-comet/pull/3076#discussion_r2695966185


##########
docs/source/user-guide/latest/configs.md:
##########
@@ -110,6 +110,8 @@ These settings can be used to determine which parts of the 
plan are accelerated
 | `spark.comet.exec.shuffle.writeBufferSize` | Size of the write buffer in 
bytes used by the native shuffle writer when writing shuffle data to disk. 
Larger values may improve write performance by reducing the number of system 
calls, but will use more memory. The default is 1MB which provides a good 
balance between performance and memory usage. | 1048576b |
 | `spark.comet.native.shuffle.partitioning.hash.enabled` | Whether to enable 
hash partitioning for Comet native shuffle. | true |
 | `spark.comet.native.shuffle.partitioning.range.enabled` | Whether to enable 
range partitioning for Comet native shuffle. | true |
+| `spark.comet.native.shuffle.partitioning.roundrobin.enabled` | Whether to 
enable round robin partitioning for Comet native shuffle. This is disabled by 
default because Comet's round-robin produces different partition assignments 
than Spark. Spark sorts rows by their binary UnsafeRow representation before 
assigning partitions, but Comet uses Arrow format which has a different binary 
layout. Instead, Comet implements round-robin as hash partitioning on all 
columns, which achieves the same goals: even distribution, deterministic output 
(for fault tolerance), and no semantic grouping. This is functionally correct 
but may cause test failures when comparing results with Spark. | false |

Review Comment:
   This is functionally correct but may cause test failures when comparing 
results with Spark. - this phrase is confusing and ambiguous IMO, we should 
prob just mention that without sorting the output row order might be not the 
same as Spark



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to