neilconway opened a new pull request, #21726:
URL: https://github.com/apache/datafusion/pull/21726

   ## Which issue does this PR close?
   
   - Closes #21724.
   
   ## Rationale for this change
   
   Some profiling suggested that `OptimizeProjections` was among the most 
heavyweight of the logical optimizer passes for TPC-DS. This PR implements two 
distinct optimizations:
   
   1. In `RequiredIndices::add_expr`, the previous implementation created a 
`HashSet` and walked the expression tree twice, adding reference columns to the 
`HashSet`. Finally, members of the `HashSet` were converted to indices. It is 
faster to just walk the expression tree once ourselves and convert column 
references to indices. This saves the HashSet allocation and insertions, plus 
one redundant tree walk.
   
   2. In `optimize_projections`, we computed the minimal required set of `GROUP 
BY` columns, based on functional dependencies. This was relatively expensive; 
when there are no functional dependencies (common), this was still quite 
expensive but will always be a no-op. Add a short-circuit to skip the redundant 
computation in this scenario.
   
   Results on a newly added `optimize_projections` microbenchmark:
   
   ```
     - tpch_q3: 14.6 µs → 11.9 µs (−18.5%)
     - tpch_q5: 17.4 µs → 14.0 µs (−19.4%)
     - clickbench_groupby: 10.3 µs → 6.8 µs (−34.1%)
     - tpcds_subquery: 11.2 µs → 8.7 µs (−22.1%)
     - small_schema: 1.87 µs → 1.68 µs (−10.3%)
   ```
   
   ## What changes are included in this PR?
   
   * Add microbenchmark for `optimize_projections`
   * Implement two optimizations
   
   ## Are these changes tested?
   
   Yes.
   
   ## Are there any user-facing changes?
   
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to