Re: [PR] perf: Convert inner joins to semi joins when equivalent [datafusion]

via GitHub Sun, 31 May 2026 06:40:28 -0700


neilconway commented on code in PR #22652:
URL: https://github.com/apache/datafusion/pull/22652#discussion_r3330294444



##########
datafusion/optimizer/src/eliminate_join.rs:
##########
@@ -15,19 +15,75 @@
 // specific language governing permissions and limitations
 // under the License.
 
-//! [`EliminateJoin`] rewrites `INNER JOIN` with `true`/`null`
-use crate::optimizer::ApplyOrder;
+//! [`EliminateJoin`] rewrites inner joins to simpler forms to make them 
cheaper
+//! to evaluate.
+//!
+//! # What it rewrites
+//!
+//! * An inner join can be rewritten to an empty relation if the join condition
+//!   is trivially false.
+//!
+//! * An inner join `L ⋈ R` can be rewritten to a left semi join `L ⋉ R`
+//!   (`LeftSemi`), which keeps the rows of L that have a match in R and 
outputs
+//!   only L's columns. The rewrite to `L ⋉ R` is valid when both of the
+//!   following are true:
+//!
+//!     1. None of R's columns are referenced above the join.
+//!     2. R does not observably multiply L's rows. This holds when either the
+//!        join's ancestors are duplicate-insensitive (e.g., DISTINCT) or we 
can use
+//!        functional dependencies to prove that each L row matches at most 
one R
+//!        row (R is provably unique on the join keys).
+//!
+//! # How it works
+//!
+//! `rewrite_subtree` walks the plan top-down, threading two pieces of context
+//! down to each join:
+//!
+//! * `live` — which of the join's output columns are referenced above it. It 
is
+//!   propagated top-down: each node asks its children only for the columns it
+//!   needs from them, so a projection or aggregate asks for just the columns 
its
+//!   expressions reference, dropping the rest (the narrowing); a join splits 
the
+//!   set across its two inputs.
+//! * `duplicate_insensitive` — whether emitting each row once instead of many
+//!   times will not change the output. A duplicate-collapsing node (e.g.,
+//!   DISTINCT, GROUP BY with no aggregate functions, or the existence side of 
a
+//!   semi/anti/mark join) sets it `true` for its subtree, and it propagates
+//!   downward until a node that makes the row count observable again (a 
`LIMIT`,
+//!   a top-N sort, ...) clears it. It is therefore fixed by the nearest such
+//!   node, not by the whole ancestor chain: a collapsing node shields its 
subtree,
+//!   so a duplicate-sensitive node further above does not matter.
+//!
+//! At each join, `rewritten_join_type` combines this context with the side's
+//! functional dependencies to choose `Inner`, `LeftSemi`, or `RightSemi`. Most
+//! node types just forward the context to their single child via
+//! `rewrite_single_input`; nodes that alter column requirements or
+//! duplicate-sensitivity (projection, aggregate, sort, ...) adjust it first.
+//!
+//! Joins nested inside subquery expressions are reached as well: 
`rewrite_subtree`
+//! descends into each node's subquery plans itself (via `map_subqueries`),
+//! seeding each as a fresh root.
+use crate::utils::for_each_referenced_index;
 use crate::{OptimizerConfig, OptimizerRule};
-use datafusion_common::tree_node::Transformed;
-use datafusion_common::{Result, ScalarValue};
-use datafusion_expr::JoinType::Inner;
+use datafusion_common::tree_node::{Transformed, TreeNode};
+use datafusion_common::{
+    DFSchema, Dependency, HashSet, NullEquality, Result, ScalarValue,
+};
 use datafusion_expr::{
-    Expr,
-    logical_plan::{EmptyRelation, LogicalPlan},
+    Expr, JoinType, SortExpr,
+    logical_plan::{
+        Aggregate, Distinct, DistinctOn, EmptyRelation, Filter, Join, Limit, 
LogicalPlan,
+        Partitioning, Projection, Repartition, Sort, SubqueryAlias,
+    },
 };
+use std::sync::Arc;
+
+/// The columns that are "live" at a plan node, i.e., which of its output
+/// columns are referenced by an ancestor node. Represented as a set of column
+/// indices, relative to the node's schema.
+type LiveColumns = HashSet<usize>;

Review Comment:
   1. This is similar but not identical to the `RequiredIndices` data structure 
used by `OptimizeProjections`. `RequiredIndices` cares about insertion order 
but we don't, so it seemed cleaner to use a different data structure here.
   2. It would be better to use a bitmap than a `HashSet`. We could do so by 
adding a dependency on a third-party bitmap implementation (e.g., 
https://github.com/petgraph/fixedbitset, which one of our indirect dependencies 
already pulls in). But the performance impact should be small, so I'm not sure 
it's worth adding the dep.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] perf: Convert inner joins to semi joins when equivalent [datafusion]

Reply via email to