alamb commented on issue #6278:
URL:
https://github.com/apache/arrow-datafusion/issues/6278#issuecomment-1539081601
Interestingly, the explain analyze output shows all the time being spent in
the AggregateExec `elapsed_compute=11.714486088s` -- maybe the aggregate exec
is re-calculating its input exprs incorrectly.
```
❯ explain analyze select
l_returnflag,
l_linestatus,
sum(l_quantity) as sum_qty,
sum(l_extendedprice) as sum_base_price,
sum(l_extendedprice * (1 - l_discount)) as sum_disc_price,
sum(l_extendedprice * (1 - l_discount) * (1 + l_tax)) as sum_charge,
avg(l_quantity) as avg_qty,
avg(l_extendedprice) as avg_price,
avg(l_discount) as avg_disc,
count(*) as count_order
from
lineitem
where
l_shipdate <= date '1998-09-02'
group by
l_returnflag,
l_linestatus
order by
l_returnflag,
l_linestatus;
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | CoalescePartitionsExec, metrics=[output_rows=4,
elapsed_compute=21.427µs, spill_count=0, spilled_bytes=0, mem_used=0]
|
| | ProjectionExec: expr=[l_returnflag@0 as
l_returnflag, l_linestatus@1 as l_linestatus, SUM(lineitem.l_quantity)@2 as
sum_qty, SUM(lineitem.l_extendedprice)@3 as sum_base_price,
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount)@4 as
sum_disc_price, SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount *
Int64(1) + lineitem.l_tax)@5 as sum_charge, AVG(lineitem.l_quantity)@6 as
avg_qty, AVG(lineitem.l_extendedprice)@7 as avg_price,
AVG(lineitem.l_discount)@8 as avg_disc, COUNT(UInt8(1))@9 as count_order],
metrics=[output_rows=4, elapsed_compute=114.909µs, spill_count=0,
spilled_bytes=0, mem_used=0] |
| | AggregateExec: mode=FinalPartitioned,
gby=[l_returnflag@0 as l_returnflag, l_linestatus@1 as l_linestatus],
aggr=[SUM(lineitem.l_quantity), SUM(lineitem.l_extendedprice),
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount),
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) +
lineitem.l_tax), AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice),
AVG(lineitem.l_discount), COUNT(UInt8(1))], metrics=[output_rows=4,
elapsed_compute=3.558233ms, spill_count=0, spilled_bytes=0, mem_used=0]
|
| | CoalesceBatchesExec: target_batch_size=8192,
metrics=[output_rows=64, elapsed_compute=1.368734ms, spill_count=0,
spilled_bytes=0, mem_used=0]
|
| | RepartitionExec: partitioning=Hash([Column {
name: "l_returnflag", index: 0 }, Column { name: "l_linestatus", index: 1 }],
16), input_partitions=16, metrics=[fetch_time=199.828628049s,
repart_time=1.858216ms, send_time=227.016µs]
|
| | AggregateExec: mode=Partial,
gby=[l_returnflag@5 as l_returnflag, l_linestatus@6 as l_linestatus],
aggr=[SUM(lineitem.l_quantity), SUM(lineitem.l_extendedprice),
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount),
SUM(lineitem.l_extendedprice * Int64(1) - lineitem.l_discount * Int64(1) +
lineitem.l_tax), AVG(lineitem.l_quantity), AVG(lineitem.l_extendedprice),
AVG(lineitem.l_discount), COUNT(UInt8(1))], metrics=[output_rows=64,
elapsed_compute=11.714486088s, spill_count=0, spilled_bytes=0, mem_used=0]
|
| | ProjectionExec: expr=[l_extendedprice@1 *
(Some(1),20,0 - l_discount@2) as lineitem.l_extendedprice *
(Decimal128(Some(1),20,0) - lineitem.l_discount)Decimal128(Some(1),20,0) -
lineitem.l_discountlineitem.l_discountDecimal128(Some(1),20,0)lineitem.l_extendedprice,
l_quantity@0 as l_quantity, l_extendedprice@1 as l_extendedprice, l_discount@2
as l_discount, l_tax@3 as l_tax, l_returnflag@4 as l_returnflag, l_linestatus@5
as l_linestatus], metrics=[output_rows=5916591, elapsed_compute=377.249296ms,
spill_count=0, spilled_bytes=0, mem_used=0]
|
| | CoalesceBatchesExec:
target_batch_size=8192, metrics=[output_rows=5916591,
elapsed_compute=222.551791ms, spill_count=0, spilled_bytes=0, mem_used=0]
|
| | FilterExec: l_shipdate@6 <= 10471,
metrics=[output_rows=5916591, elapsed_compute=202.848245ms, spill_count=0,
spilled_bytes=0, mem_used=0]
|
| | RepartitionExec:
partitioning=RoundRobinBatch(16), input_partitions=1,
metrics=[fetch_time=6.940083ms, repart_time=1ns, send_time=1.861309ms]
|
| | MemoryExec: partitions=1,
partition_sizes=[733], metrics=[]
|
| |
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set. Query took 12.538 seconds.
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]