thinkharderdev commented on code in PR #12523: URL: https://github.com/apache/datafusion/pull/12523#discussion_r1769224022
########## datafusion/physical-plan/src/joins/hash_join.rs: ########## @@ -71,11 +71,68 @@ use datafusion_physical_expr::equivalence::{ use datafusion_physical_expr::PhysicalExprRef; use ahash::RandomState; +use arrow_buffer::BooleanBuffer; use datafusion_expr::Operator; use datafusion_physical_expr_common::datum::compare_op_for_nested; use futures::{ready, Stream, StreamExt, TryStreamExt}; use parking_lot::Mutex; +/// `SharedJoinState` provides an extension point allowing +/// `HashJoinStream` to share the `visited_indices_bitmap` of the build side of a join +/// across probe tasks without shared memory. +/// +/// This can be used to, for example, implement a left outer join efficiently as a broadcast join +/// if the left side is small Review Comment: No, it's the opposite. The left (build) side is small and can be efficiently broadcast. Then the right (probe) side can be partitioned across multiple nodes with the build side broadcast to all of them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org For additional commands, e-mail: github-h...@datafusion.apache.org