sgrebnov opened a new pull request, #13267:
URL: https://github.com/apache/datafusion/pull/13267

   ## Which issue does this PR close?
   
   The `optimize_projections` optimization is very useful when used alongside 
unparsing logic, as it pushes down projections to the `TableScan` and ensures 
only required columns are fetched. The downside of this process is that the 
rule modifies the original plan in a way that makes it difficult to unparse, 
and the resultant plan is not always optimal or efficient for unparsing use 
cases, for example
   
   https://gist.github.com/sgrebnov/5071d2834e812b62bfdf434cf7e7e54c
   
   Original query (TPC-DS Q72)
   ```sql
   select  i_item_desc
         ,w_warehouse_name
         ,d1.d_week_seq
         ,sum(case when p_promo_sk is null then 1 else 0 end) no_promo
         ,sum(case when p_promo_sk is not null then 1 else 0 end) promo
         ,count(*) total_cnt
   from catalog_sales
   join inventory on (cs_item_sk = inv_item_sk)
   join warehouse on (w_warehouse_sk=inv_warehouse_sk)
   join item on (i_item_sk = cs_item_sk)
   join customer_demographics on (cs_bill_cdemo_sk = cd_demo_sk)
   join household_demographics on (cs_bill_hdemo_sk = hd_demo_sk)
   join date_dim d1 on (cs_sold_date_sk = d1.d_date_sk)
   join date_dim d2 on (inv_date_sk = d2.d_date_sk)
   join date_dim d3 on (cs_ship_date_sk = d3.d_date_sk)
   left outer join promotion on (cs_promo_sk=p_promo_sk)
   left outer join catalog_returns on (cr_item_sk = cs_item_sk and 
cr_order_number = cs_order_number)
   where d1.d_week_seq = d2.d_week_seq
     and inv_quantity_on_hand < cs_quantity
     and d3.d_date > d1.d_date + INTERVAL '5 days'
     and hd_buy_potential = '501-1000'
     and d1.d_year = 1999
     and cd_marital_status = 'S'
   group by i_item_desc,w_warehouse_name,d1.d_week_seq
   order by total_cnt desc, i_item_desc, w_warehouse_name, d_week_seq
    LIMIT 100;
   ```
   
   Plan and query after applying `optimize_projections` rule. Notice the 
additional projections added after joins.
   
   <img width="739" alt="image" 
src="https://github.com/user-attachments/assets/c76d00e1-e3e4-423b-8dad-a48b58a9a63d";>
   
   
   ```sql
   select
        "i_item_desc",
        "w_warehouse_name",
        "d_week_seq",
        sum(case when "p_promo_sk" is null then 1 else 0 end) as "no_promo",
        sum(case when "p_promo_sk" is not null then 1 else 0 end) as "promo",
        count(1) as "total_cnt"
   from
        (
        select
                "w_warehouse_name",
                "i_item_desc",
                "d_week_seq",
                "p_promo_sk"
        from
                (
                select
                        "cs_item_sk",
                        "cs_order_number",
                        "w_warehouse_name",
                        "i_item_desc",
                        "d_week_seq",
                        "promotion"."p_promo_sk"
                from
                        (
                        select
                                "cs_item_sk",
                                "cs_promo_sk",
                                "cs_order_number",
                                "w_warehouse_name",
                                "i_item_desc",
                                "d_week_seq"
                        from
                                (
                                select
                                        "cs_ship_date_sk",
                                        "cs_item_sk",
   ...
   ```
   
   ## Rationale for this change
   
   To support unparsing plans after `optimize_projections` is applied, it is 
proposed to add the `optimize_projections_preserve_existing_projections` config 
option to prevent the optimization logic from creating or removing projections 
and to preserve the original structure. It ensures the query layout remains 
simple and readable, relying on the underlying SQL engine to apply its own 
optimizations during execution.
   
   ## Are these changes tested?
   
   Added test for `optimize_projections_preserve_existing_projections` config 
options. Unparsing have been tested by running all TPC-H and TPC-DS queries 
with `optimization_projections` enabled.
   
   ## Are there any user-facing changes?
   
   Yes, a new `optimize_projections_preserve_existing_projections` 
configuration option has been introduced, which can be specified via 
`SessionConfig` or at a lower level using `OptimizerContext::new_with_options`.
   
   ```rust
   SessionStateBuilder::new()
     .with_config(
         
SessionConfig::new().with_optimize_projections_preserve_existing_projections(true),
     )
     .build();
   ```
   Or 
   
   ```rust
   let mut config = ConfigOptions::new();
   config
       .optimizer
       .optimize_projections_preserve_existing_projections =
       preserve_projections;
   let optimizer_context = OptimizerContext::new_with_options(config);
   ```
   
   There is no changes in default behavior.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: github-unsubscr...@datafusion.apache.org
For additional commands, e-mail: github-h...@datafusion.apache.org

Reply via email to