NGA-TRAN opened a new issue, #18250:
URL: https://github.com/apache/datafusion/issues/18250

   ### Is your feature request related to a problem or challenge?
   
   This task is part of  feature #18249, aimed at illustrating what a query’s 
join graph looks like. We use TPC-H Q5 as the example.
   
   ### Describe the solution you'd like
   
   TPC-H Query 5 involves 6 tables and 6 joins, as our example.
   
   ```SQL
   select
      n_name,
      sum(l_extendedprice * (1 - l_discount)) as revenue
   from
      customer,
      orders,
      lineitem,
      supplier,
      nation,
      region
   where
          c_custkey = o_custkey
    and l_orderkey = o_orderkey
    and l_suppkey = s_suppkey
    and c_nationkey = s_nationkey
    and s_nationkey = n_nationkey
    and n_regionkey = r_regionkey
    and r_name = 'ASIA'
    and o_orderdate >= date '1994-01-01' 
    and o_orderdate < date '1995-01-01'
   group by
      n_name
   order by
      revenue desc;
   ```
   
   ## Join Graph
   
   Since our focus is join order enumeration, we’ll represent the query as a 
join graph:
   - Tables are shown as circles
   - Joins appear as edges between circles:
         - Directed edges indicate many-to-one joins
         - Undirected edges indicate many-to-many joins
   - Columns inside a circle denote selection predicates (filters) on that table
         - Orders table is filtered on o_orderdate with ~15% selectivity.  
         - Region table is filtered o r_name and 20% selectivity
   - Group-by column: n_name.
   - Order-by field: the aggregation
   - Color coding reflects table partitioning, sort order, and size category
   
   These properties illustrate factors that influence join enumeration in the 
next section. They’re optional and can be extended or omitted based on specific 
needs
   
   <img width="336" height="380" alt="Image" 
src="https://github.com/user-attachments/assets/d4198939-9df4-483a-937a-4b25ca143d97";
 />
   
   
   ### Describe alternatives you've considered
   
   - 
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to