alamb commented on issue #6278: URL: https://github.com/apache/arrow-datafusion/issues/6278#issuecomment-1539077274
@viirya is this something you can look into?
## Reproducer
You can reproduce the results using `datafusion-cli`:
Make data:
```shell
cd benchmarks
./bench.sh data all
```
Install datafusion-cli locally (or run it however else you want):
```shell
cargo install --path datafusion-cli
```
Run the query:
```sql
❯ -- load lineitem table into memory
create table lineitem as select * from 'data/lineitem';
❯ select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-09-02'
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
| l_returnflag | l_linestatus | sum_qty | sum_base_price |
sum_disc_price | sum_charge | avg_qty | avg_price | avg_disc |
count_order |
+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
| A | F | 37734107.00 | 56586554400.73 |
53758257134.8700 | 55909065222.827692 | 25.522005 | 38273.129734 | 0.049985 |
1478493 |
| N | F | 991417.00 | 1487504710.38 |
1413082168.0541 | 1469649223.194375 | 25.516471 | 38284.467760 | 0.050093 |
38854 |
| N | O | 74476040.00 | 111701729697.74 |
106118230307.6056 | 110367043872.497010 | 25.502226 | 38249.117988 | 0.049996 |
2920374 |
| R | F | 37719753.00 | 56568041380.90 |
53741292684.6040 | 55889619119.831932 | 25.505793 | 38250.854626 | 0.050009 |
1478870 |
+--------------+--------------+-------------+-----------------+-------------------+---------------------+-----------+--------------+----------+-------------+
4 rows in set. Query took 6.966 seconds.
```
## Explain Plans
The explain plans are identical between `23.0.0` and `main`:
```
❯ explain select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-09-02'
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------+
| plan_type | plan
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------+
| logical_plan | Sort: lineitem.l_returnflag ASC NULLS LAST,
lineitem.l_linestatus ASC NULLS LAST
|
| | Projection: lineitem.l_returnflag,
lineitem.l_linestatus, SUM(lineitem.l_quantity) AS sum_qty,
SUM(lineitem.l_extendedprice) AS sum_base_price, SUM(lineitem.l_extendedprice *
Int64(1) - lineitem.l_discount) AS sum_disc_price, SUM(lineitem.l_extendedprice
* Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax) AS sum_charge,
AVG(lineitem.l_quantity) AS avg_qty, AVG(lineitem.l_extendedprice) AS
avg_price, AVG(lineitem.l_discount) AS avg_disc, COUNT(UInt8(1)) AS count_order
|
| | Aggregate: groupBy=[[lineitem.l_returnflag,
lineitem.l_linestatus]], aggr=[[SUM(lineitem.l_quantity),
SUM(lineitem.l_extendedprice), SUM(lineitem.l_extendedprice *
(Decimal128(Some(1),20,0) - lineitem.l_discount)Decimal128(Some(1),20,0) -
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice
AS lineitem.l_extendedprice * Decimal128(Some(1),20,0) - lineitem.l_discount)
AS SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount),
SUM(lineitem.l_extendedprice * (Decimal128(Some(1),20,0) -
lineitem.l_discount)Decimal128(Some(1),20,0) -
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice
AS lineitem.l_extendedprice * Decimal128(Some(1),20,0) - lineitem.l_discount *
(Decimal128(Some(1),20,0) + lineitem.l_tax)) AS SUM(lineitem.l_extendedprice *
Int64(1) - lineitem.l_discount * Int64(1) + lineitem.l_tax),
AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice),
AVG(lineitem.l_discount), COUNT(U
Int8(1))]] |
| | Projection: lineitem.l_extendedprice *
(Decimal128(Some(1),20,0) - lineitem.l_discount) AS lineitem.l_extendedprice *
(Decimal128(Some(1),20,0) - lineitem.l_discount)Decimal128(Some(1),20,0) -
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice,
lineitem.l_quantity, lineitem.l_extendedprice, lineitem.l_discount,
lineitem.l_tax, lineitem.l_returnflag, lineitem.l_linestatus
|
| | Filter: lineitem.l_shipdate <= Date32("10471")
|
| | TableScan: lineitem projection=[l_quantity,
l_extendedprice, l_discount, l_tax, l_returnflag, l_linestatus, l_shipdate]
|
| physical_plan | SortPreservingMergeExec: [l_returnflag@0 ASC NULLS
LAST,l_linestatus@1 ASC NULLS LAST]
|
| | SortExec: expr=[l_returnflag@0 ASC NULLS
LAST,l_linestatus@1 ASC NULLS LAST]
|
| | ProjectionExec: expr=[l_returnflag@0 as l_returnflag,
l_linestatus@1 as l_linestatus, SUM(lineitem.l_quantity)@2 as sum_qty,
SUM(lineitem.l_extendedprice)@3 as sum_base_price, SUM(lineitem.l_extendedprice
* Int64(1) - lineitem.l_discount)@4 as sum_disc_price,
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) +
lineitem.l_tax)@5 as sum_charge, AVG(lineitem.l_quantity)@6 as avg_qty,
AVG(lineitem.l_extendedprice)@7 as avg_price, AVG(lineitem.l_discount)@8 as
avg_disc, COUNT(UInt8(1))@9 as count_order]
|
| | AggregateExec: mode=FinalPartitioned,
gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus],
aggr=[SUM(lineitem.l_quantity), SUM(lineitem.l_extendedprice),
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount),
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) +
lineitem.l_tax), AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice),
AVG(lineitem.l_discount), COUNT(UInt8(1))]
|
| | CoalesceBatchesExec: target_batch_size=8192
|
| | RepartitionExec: partitioning=Hash([Column {
name: "l_returnflag", index: 0 }, Column { name: "l_linestatus", index: 1 }],
16), input_partitions=16
|
| | AggregateExec: mode=Partial,
gby=[l_returnflag@5 as l_returnflag, l_linestatus@6 as l_linestatus],
aggr=[SUM(lineitem.l_quantity), SUM(lineitem.l_extendedprice),
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount),
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) +
lineitem.l_tax), AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice),
AVG(lineitem.l_discount), COUNT(UInt8(1))]
|
| | ProjectionExec: expr=[l_extendedprice@1 *
(Some(1),20,0 - l_discount@2) as lineitem.l_extendedprice *
(Decimal128(Some(1),20,0) - lineitem.l_discount)Decimal128(Some(1),20,0) -
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice,
l_quantity@0 as l_quantity, l_extendedprice@1 as l_extendedprice, l_discount@2
as l_discount, l_tax@3 as l_tax, l_returnflag@4 as l_returnflag, l_linestatus@5
as l_linestatus]
|
| | CoalesceBatchesExec:
target_batch_size=8192
|
| | FilterExec: l_shipdate@6 <= 10471
|
| | RepartitionExec:
partitioning=RoundRobinBatch(16), input_partitions=1
|
| | MemoryExec: partitions=1,
partition_sizes=[733]
|
| |
|
+---------------+------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-----------+
2 rows in set. Query took 0.011 seconds.
```
[explain_22.0.0.txt](https://github.com/apache/arrow-datafusion/files/11425234/explain_22.0.0.txt)
[explain_main.txt](https://github.com/apache/arrow-datafusion/files/11425235/explain_main.txt)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
