JanKaul opened a new pull request, #22050:
URL: https://github.com/apache/datafusion/pull/22050
## Which issue does this PR close?
- Part of #17719
- Part of #18250
## Rationale for this change
Join-ordering algorithms (DPhyp, DPccp, …) operate on a graph view of
the join region rather than a `LogicalPlan` tree. DataFusion has no
such structure today, so any future reordering rule has to re-derive
one. This PR adds the data structure and the `LogicalPlan ⇄ JoinGraph`
boundary so the follow-up enumeration work in epic #18249 has
something concrete to build on.
## What changes are included in this PR?
New `datafusion/optimizer/src/reorder_join/join_graph.rs`:
- `JoinGraph`, `Node`, `Edge` with `NodeId` / `EdgeId` handles backed
by an internal `VecMap` (stable indices, no reuse on removal).
- `JoinGraph::try_from_logical_plan(plan) -> Result<(JoinGraph,
Vec<LogicalPlan>)>`:
- strips wrapper operators above the topmost join and returns them
so the caller can reapply them after reordering;
- decomposes inner joins into nodes (leaf relations) and edges
(equi-join predicates);
- hoists non-equi predicates — both `Join.filter` and `Filter` nodes
sitting between inner joins — into a side-channel `filters` list;
- treats non-inner joins and other operators nested between joins
(Aggregate, Projection, …) as opaque leaves.
- `reconstruct_plan(join_plan, wrappers)` re-applies the stripped
wrappers after reordering.
- Mutation API for the future enumerator: `add_node`,
`add_node_with_edge`, `remove_node`, `remove_edge`,
`Node::neighbours`, `Node::connections`.
- Module exported from `datafusion/optimizer/src/lib.rs`.
No optimizer rule is registered; nothing consumes `JoinGraph` outside
tests.
## Are these changes tested?
Yes — unit tests in `join_graph.rs`:
- three-way inner join with a non-equi `Join.filter` (predicate lands
in side-channel);
- `Filter` between two inner joins (hoisted; both joins still
decompose);
- `Aggregate` between two inner joins (opaque leaf);
- `LEFT` join nested inside an inner chain (opaque leaf);
- top-level non-inner join (single opaque leaf).
No sqllogictest changes — no planner-visible behavior yet.
## Are there any user-facing changes?
No. `JoinGraph` is a new internal data structure in
`datafusion-optimizer`; no existing API changes and no rule consumes
it yet.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]