Re: [I] TPCH q1 with no predicates is 2x slower than duckdb [datafusion]

via GitHub Sat, 03 Jan 2026 06:00:02 -0800


Dandandan commented on issue #18411:
URL: https://github.com/apache/datafusion/issues/18411#issuecomment-3707073395


   Looking at the the query and profiling, I think there is a nice optimization 
we can do for optimizing `AVG` that we don't yet do.
   
   Quite a bit of time is spent in the (sum/avg/count) accumulators:
   
   <img width="1570" height="144" alt="Image" 
src="https://github.com/user-attachments/assets/3d6ffdd5-0414-4355-8397-f527226c3b5d";
 />
   
   
   ```
   select
       l_returnflag,
       l_linestatus,
       sum(l_quantity) as sum_qty,
       sum(l_extendedprice) as sum_base_price,
       sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
       sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
       avg(l_quantity) as avg_qty,
       avg(l_extendedprice) as avg_price,
       avg(l_discount) as avg_disc,
       count(*) as count_order
   from
       lineitem
   where
           l_shipdate <= date '1998-09-02'
   group by
       l_returnflag,
       l_linestatus
   order by
       l_returnflag,
       l_linestatus;
   ```
   
   Zooming in on the avg:
   ```
       avg(l_quantity) as avg_qty,
       avg(l_extendedprice) as avg_price,
       avg(l_discount) as avg_disc,
       count(*) as count_order
   ```
   Those accumulators will all (redundantly) compute the count.
   
   We should be able to rewrite it to use a shared count (and thus faster 
accumulators):
   
   ```
       sum(l_quantity) / count_order,
       sum(l_extendedprice) / count_order,
       sum(l_discount) as avg_disc,
       count_order
   ```
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [I] TPCH q1 with no predicates is 2x slower than duckdb [datafusion]

Reply via email to