alamb commented on issue #6278:
URL: 
https://github.com/apache/arrow-datafusion/issues/6278#issuecomment-1539077274

   @viirya  is this something you can look into? 
   
   ## Reproducer
   You can reproduce the results using `datafusion-cli`:
   
   Make data:
   
   ```shell
   cd benchmarks
   ./bench.sh data all
   ```
   
   Install datafusion-cli locally (or run it however else you want):
   ```shell
   cargo install --path datafusion-cli
   ```
   
   Run the query:
   ```sql
   
   ❯ -- load lineitem table into memory
   create table lineitem as select * from 'data/lineitem';
   
   
   ❯ select
       l_returnflag,
       l_linestatus,
       sum(l_quantity) as sum_qty,
       sum(l_extendedprice) as sum_base_price,
       sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
       sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
       avg(l_quantity) as avg_qty,
       avg(l_extendedprice) as avg_price,
       avg(l_discount) as avg_disc,
       count(*) as count_order
   from
       lineitem
   where
           l_shipdate <= date '1998-09-02'
   group by
       l_returnflag,
       l_linestatus
   order by
       l_returnflag,
       l_linestatus;
   
+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
   | l_returnflag | l_linestatus | sum_qty     | sum_base_price  | 
sum_disc_price    | sum_charge          | avg_qty   | avg_price    | avg_disc | 
count_order |
   
+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
   | A            | F            | 37734107.00 | 56586554400.73  | 
53758257134.8700  | 55909065222.827692  | 25.522005 | 38273.129734 | 0.049985 | 
1478493     |
   | N            | F            | 991417.00   | 1487504710.38   | 
1413082168.0541   | 1469649223.194375   | 25.516471 | 38284.467760 | 0.050093 | 
38854       |
   | N            | O            | 74476040.00 | 111701729697.74 | 
106118230307.6056 | 110367043872.497010 | 25.502226 | 38249.117988 | 0.049996 | 
2920374     |
   | R            | F            | 37719753.00 | 56568041380.90  | 
53741292684.6040  | 55889619119.831932  | 25.505793 | 38250.854626 | 0.050009 | 
1478870     |
   
+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
   4 rows in set. Query took 6.966 seconds.
   ```
   
   ## Explain Plans
   The explain plans are identical between `23.0.0` and `main`: 
   ```
   
   ❯ explain select
       l_returnflag,
       l_linestatus,
       sum(l_quantity) as sum_qty,
       sum(l_extendedprice) as sum_base_price,
       sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
       sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
       avg(l_quantity) as avg_qty,
       avg(l_extendedprice) as avg_price,
       avg(l_discount) as avg_disc,
       count(*) as count_order
   from
       lineitem
   where
           l_shipdate <= date '1998-09-02'
   group by
       l_returnflag,
       l_linestatus
   order by
       l_returnflag,
       l_linestatus;
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 -----------+
   | plan_type     | plan                                                       
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
            |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 -----------+
   | logical_plan  | Sort: lineitem.l_returnflag ASC NULLS LAST, 
lineitem.l_linestatus ASC NULLS LAST                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                     
            |
   |               |   Projection: lineitem.l_returnflag, 
lineitem.l_linestatus, SUM(lineitem.l_quantity) AS sum_qty, 
SUM(lineitem.l_extendedprice) AS sum_base_price, SUM(lineitem.l_extendedprice * 
Int64(1) - lineitem.l_discount) AS sum_disc_price, SUM(lineitem.l_extendedprice 
* Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax) AS sum_charge, 
AVG(lineitem.l_quantity) AS avg_qty, AVG(lineitem.l_extendedprice) AS 
avg_price, AVG(lineitem.l_discount) AS avg_disc, COUNT(UInt8(1)) AS count_order 
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
             
            |
   |               |     Aggregate: groupBy=[[lineitem.l_returnflag, 
lineitem.l_linestatus]], aggr=[[SUM(lineitem.l_quantity), 
SUM(lineitem.l_extendedprice), SUM(lineitem.l_extendedprice * 
(Decimal128(Some(1),20,0) - lineitem.l_discount)Decimal128(Some(1),20,0) - 
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice
 AS lineitem.l_extendedprice * Decimal128(Some(1),20,0) - lineitem.l_discount) 
AS SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), 
SUM(lineitem.l_extendedprice * (Decimal128(Some(1),20,0) - 
lineitem.l_discount)Decimal128(Some(1),20,0) - 
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice
 AS lineitem.l_extendedprice * Decimal128(Some(1),20,0) - lineitem.l_discount * 
(Decimal128(Some(1),20,0) + lineitem.l_tax)) AS SUM(lineitem.l_extendedprice * 
Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax), 
AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice), 
AVG(lineitem.l_discount), COUNT(U
 Int8(1))]] |
   |               |       Projection: lineitem.l_extendedprice * 
(Decimal128(Some(1),20,0) - lineitem.l_discount) AS lineitem.l_extendedprice * 
(Decimal128(Some(1),20,0) - lineitem.l_discount)Decimal128(Some(1),20,0) - 
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice,
 lineitem.l_quantity, lineitem.l_extendedprice, lineitem.l_discount, 
lineitem.l_tax, lineitem.l_returnflag, lineitem.l_linestatus                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                              
            |
   |               |         Filter: lineitem.l_shipdate <= Date32("10471")     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
            |
   |               |           TableScan: lineitem projection=[l_quantity, 
l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate]     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                           
            |
   | physical_plan | SortPreservingMergeExec: [l_returnflag@0 ASC NULLS 
LAST,l_linestatus@1 ASC NULLS LAST]                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
            |
   |               |   SortExec: expr=[l_returnflag@0 ASC NULLS 
LAST,l_linestatus@1 ASC NULLS LAST]                                             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                      
            |
   |               |     ProjectionExec: expr=[l_returnflag@0 as l_returnflag, 
l_linestatus@1 as l_linestatus, SUM(lineitem.l_quantity)@2 as sum_qty, 
SUM(lineitem.l_extendedprice)@3 as sum_base_price, SUM(lineitem.l_extendedprice 
* Int64(1) - lineitem.l_discount)@4 as sum_disc_price, 
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + 
lineitem.l_tax)@5 as sum_charge, AVG(lineitem.l_quantity)@6 as avg_qty, 
AVG(lineitem.l_extendedprice)@7 as avg_price, AVG(lineitem.l_discount)@8 as 
avg_disc, COUNT(UInt8(1))@9 as count_order]                                     
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
          
            |
   |               |       AggregateExec: mode=FinalPartitioned, 
gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus], 
aggr=[SUM(lineitem.l_quantity), SUM(lineitem.l_extendedprice), 
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), 
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + 
lineitem.l_tax), AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice), 
AVG(lineitem.l_discount), COUNT(UInt8(1))]                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                           
            |
   |               |         CoalesceBatchesExec: target_batch_size=8192        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
            |
   |               |           RepartitionExec: partitioning=Hash([Column { 
name: "l_returnflag", index: 0 }, Column { name: "l_linestatus", index: 1 }], 
16), input_partitions=16                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                            
            |
   |               |             AggregateExec: mode=Partial, 
gby=[l_returnflag@5 as l_returnflag, l_linestatus@6 as l_linestatus], 
aggr=[SUM(lineitem.l_quantity), SUM(lineitem.l_extendedprice), 
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount), 
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) + 
lineitem.l_tax), AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice), 
AVG(lineitem.l_discount), COUNT(UInt8(1))]                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                              
            |
   |               |               ProjectionExec: expr=[l_extendedprice@1 * 
(Some(1),20,0 - l_discount@2) as lineitem.l_extendedprice * 
(Decimal128(Some(1),20,0) - lineitem.l_discount)Decimal128(Some(1),20,0) - 
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice,
 l_quantity@0 as l_quantity, l_extendedprice@1 as l_extendedprice, l_discount@2 
as l_discount, l_tax@3 as l_tax, l_returnflag@4 as l_returnflag, l_linestatus@5 
as l_linestatus]                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                           
            |
   |               |                 CoalesceBatchesExec: 
target_batch_size=8192                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                            
            |
   |               |                   FilterExec: l_shipdate@6 <= 10471        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
            |
   |               |                     RepartitionExec: 
partitioning=RoundRobinBatch(16), input_partitions=1                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                            
            |
   |               |                       MemoryExec: partitions=1, 
partition_sizes=[733]                                                           
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                 
            |
   |               |                                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
            |
   
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 -----------+
   2 rows in set. Query took 0.011 seconds.
   ```
   
   
[explain_22.0.0.txt](https://github.com/apache/arrow-datafusion/files/11425234/explain_22.0.0.txt)
   
[explain_main.txt](https://github.com/apache/arrow-datafusion/files/11425235/explain_main.txt)
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to