alamb opened a new issue, #17719:
URL: https://github.com/apache/datafusion/issues/17719

   This is part of a larger discussion for improving the APIs for Join planning 
in DataFusion 
   - https://github.com/apache/datafusion/issues/17718
   
   Specifically, many more sophisticated join ordering algorithms rely on a 
data structure called a "JoinGraph" (defined below)
   
   In DataFusin terms, a JoinGraph could be formed from any tree of 
`LogicalPlan::Join` or corresponding PhysicalPlan
   
   
   Some ideas:
   ```rust
   /// represents a join graph
   struct JoinGraph {
   ...
   }
   
   impl JoinGraph {
     // make a new join graph from a plan
     // NOTE this maybe should be in terms of physical plans / `Arc<dyn 
ExecutionPlan>`
     fn new_from_plan(logical_plan: LogicalPlan) -> Result<Self> {..}
   ...
     // create a new plan from a join grap
     fn into_plan(self) -> Result<LogicalPlan>
   }
   ```
   
   ### Join Graphs
   Quoting a pretty good [Google AI 
summary](https://www.google.com/search?q=joingraph+data+structure+database+definition+%28not+graph+database):
   
   A join graph in the context of relational databases is a data structure that 
represents the relationships and potential join operations between tables in a 
relational database schema. 
   
   Here's a breakdown of its definition:
   * [Vertices (Nodes): 
](https://www.google.com/search?sca_esv=3cba3ff7c6207a53&cs=1&q=Vertices+%28Nodes%29)Each
 vertex in a join graph corresponds to a table (or relation) in the relational 
database schema.
   * [Edges: ](https://www.google.com/search?sca_esv=3cba3ff7c6207a53)An edge 
between two vertices (tables) signifies a join relationship between those 
tables (e.g an equality predicate).
   * [Edge Labels (Join Conditions): 
](https://www.google.com/search?sca_esv=3cba3ff7c6207a53&cs=1&q=Edge+Labels+%28Join+Conditions%29)Edges
 are labeled with the specific join conditions that define how the connected 
tables are to be joined. These conditions specify the attributes that must 
match for a successful join (e.g., TableA.ID = TableB.TableA_ID).
   
   Join graphs are used to:
   * [Plan Query Execution: 
](https://www.google.com/search?sca_esv=3cba3ff7c6207a53&cs=1&q=Plan+Query+Execution)Database
 query optimizers can use join graphs to identify efficient ways to execute 
queries involving multiple joins.
   
   For example, TPCH Q3 is expressed in SQL like this:
   ```sql
   select
       l_orderkey,
       sum(l_extendedprice * (1 - l_discount)) as revenue,
       o_orderdate,
       o_shippriority
   from
       customer,
       orders,
       lineitem
   where
           c_mktsegment = 'BUILDING'
     and c_custkey = o_custkey
     and l_orderkey = o_orderkey
     and o_orderdate < date '1995-03-15'
     and l_shipdate > date '1995-03-15'
   group by
       l_orderkey,
       o_orderdate,
       o_shippriority
   order by
       revenue desc,
       o_orderdate
   limit 10;
   ```
   
   It can be represented with the following JoinGraph:
   
   ```mermaid
   graph TD
       customer["customer"]
       orders["orders"]
       lineitem["lineitem"]
   
       customer -->|c_custkey = o_custkey| orders
       orders -->|o_orderkey = l_orderkey| lineitem
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to