mingmwang commented on issue #342:
URL: https://github.com/apache/arrow-ballista/issues/342#issuecomment-1276470843
Today, For partitioned hash join, DataFusion already support CollectLeft
model, I think it is similar to the Broadcast HashJoin. I do not get a chance
to test it on Ballista yet, but I think it should work in the distribution
model. The downside is
the Left side might cause lots of duplicate re-computations.
````
fn required_input_distribution(&self) -> Vec<Distribution> {
match self.mode {
PartitionMode::CollectLeft => vec![
Distribution::SinglePartition,
Distribution::UnspecifiedDistribution,
],
PartitionMode::Partitioned => {
let (left_expr, right_expr) = self
.on
.iter()
.map(|(l, r)| {
(
Arc::new(l.clone()) as Arc<dyn PhysicalExpr>,
Arc::new(r.clone()) as Arc<dyn PhysicalExpr>,
)
})
.unzip();
vec![
Distribution::HashPartitioned(left_expr),
Distribution::HashPartitioned(right_expr),
]
}
}
}
````
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]