gene-bordegaray commented on code in PR #19304:
URL: https://github.com/apache/datafusion/pull/19304#discussion_r2621054296
##########
datafusion/physical-optimizer/src/enforce_distribution.rs:
##########
@@ -889,32 +889,41 @@ fn add_roundrobin_on_top(
/// * `hash_exprs`: Stores Physical Exprs that are used during hashing.
/// * `n_target`: desired target partition number, if partition number of the
/// current executor is less than this value. Partition number will be
increased.
+/// * `allow_subset`: Whether to allow subset partitioning logic in
satisfaction checks.
+/// Set to `false` for partitioned hash joins to ensure exact hash matching.
///
/// # Returns
///
/// A [`Result`] object that contains new execution plan where the desired
/// distribution is satisfied by adding a Hash repartition.
fn add_hash_on_top(
input: DistributionContext,
- hash_exprs: Vec<Arc<dyn PhysicalExpr>>,
+ hash_exprs: &[Arc<dyn PhysicalExpr>],
n_target: usize,
+ allow_subset: bool,
Review Comment:
I actaully went down this path and decided that having this as a property of
the hash expression itself felt incorrect. From doing this and something
similar (creating metadata for partitioning which I think has other great use
cases I will make a ticket for) I came to the conclusion that it is the job of
the optimizer to inject what shoul be happening based on the plan, not the
partitioning to create upfront.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]