avantgardnerio opened a new pull request, #5362:
URL: https://github.com/apache/arrow-datafusion/pull/5362

   # Which issue does this PR close?
   
   Closes #5357.
   
   # Rationale for this change
   
   If the planner/optimizer has information about how a table is / can be 
sorted, then it opens up the ability to push more predicates down to the 
TableProvider.
   
   For example, TPC-H query 9 might perform far better, since the table could 
be naturally ordered on the primary key `(PS_PARTKEY, PS_SUPPKEY)`:
   
   ```
   SELECT
        nation,
        o_year,
        SUM(amount) AS sum_profit
   FROM
        (
                SELECT
                        n_name AS nation,
                        YEAR(o_orderdate) AS o_year,
                        l_extendedprice * (1 - l_discount) - ps_supplycost * 
l_quantity AS amount
                FROM
                        part,
                        supplier,
                        lineitem,
                        partsupp,
                        orders,
                        nation
                WHERE
                        s_suppkey = l_suppkey
                        AND ps_suppkey = l_suppkey
                        AND ps_partkey = l_partkey
   ```
   
   after the subquery is de-correlated, it will be trying to join on the 
primary key, so it will likely:
   
   ```
   EquiJoin( ps_suppkey = l_suppkey and ps_partkey = l_partkey)
      Sort(PS_PARTKEY, PS_SUPPKEY) 
         TableScan [filter=s_suppkey]
   ```
   
   When the filter could actually filter far more rows using both columns, and 
the sort could be avoided entirely.
   
   # What changes are included in this PR?
   
   An interface change to allow `TableProviders` to inform the planner about 
single or multi-column, primary or secondary indexes, so that a future 
(fast-follow) PR can push predicates down to filter & sort in the 
`TableProvider` automatically.
   
   # Are these changes tested?
   
   No, it's an interface change.. though maybe I could test that?
   
   # Are there any user-facing changes?
   
   No, existing `TableProvider`s should be fine.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to