LiaCastaneda commented on code in PR #20246:
URL: https://github.com/apache/datafusion/pull/20246#discussion_r2798833668


##########
datafusion/common/src/config.rs:
##########
@@ -996,6 +996,11 @@ config_namespace! {
         ///
         /// Note: This may reduce parallelism, rooting from the I/O level, if 
the number of distinct
         /// partitions is less than the target_partitions.
+        ///
+        /// Note for partitioned hash join dynamic filtering:
+        /// preserving file partitions can allow partition-index routing (`i 
-> i`) instead of
+        /// CASE-hash routing, but this assumes build/probe partition indices 
stay aligned for
+        /// dynamic filter consumers.

Review Comment:
   ```suggestion
           /// CASE-hash routing, but this assumes build/probe partition 
indices stay aligned, otherwise the query might have correctness problems.
   ```
   
   I would also add a small example to clarify what “aligned” means -- for 
example, ranges 0–5 on partition 0 on both the build and probe sides, and so on.



##########
datafusion/physical-optimizer/src/enforce_distribution.rs:
##########
@@ -1454,21 +1480,58 @@ pub fn ensure_distribution(
         plan.with_new_children(children_plans)?
     };
 
+    // For partitioned hash joins, decide dynamic filter routing mode.
+    //
+    // PartitionIndex routing requires that partition `i` on the build side 
corresponds to
+    // partition `i` on the probe side. This holds when both sides' 
partitioning comes from
+    // file-grouped sources (via `preserve_file_partitions`) rather than hash 
repartitioning.
+    plan = if let Some(hash_join) = 
plan.as_any().downcast_ref::<HashJoinExec>()
+        && matches!(hash_join.mode, PartitionMode::Partitioned)
+    {
+        let routing_mode =

Review Comment:
   For a `Partitioned` join, is it possible to have one side perserving 
partitioning and not the other (have RepartitionExec on one side only)?  This 
would be a misuse from the API? if so, should we throw an error?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to