mingmwang commented on PR #5973:
URL: 
https://github.com/apache/arrow-datafusion/pull/5973#issuecomment-1506431739

   For query 18, I think the plan is problematic, it is the Join order and 
build side selection, the bottleneck is not the Aggregations.
   
   ```
   === Physical plan with metrics ===
   SortExec: expr=[o_totalprice@4 DESC,o_orderdate@3 ASC NULLS LAST], 
metrics=[output_rows=57, elapsed_compute=11.461µs, spill_count=0, 
spilled_bytes=0]
     AggregateExec: mode=Single, gby=[c_name@1 as c_name, c_custkey@0 as 
c_custkey, o_orderkey@2 as o_orderkey, o_orderdate@4 as o_orderdate, 
o_totalprice@3 as o_totalprice], aggr=[SUM(lineitem.l_quantity)], 
metrics=[output_rows=57, elapsed_compute=49.165µs, spill_count=0, 
spilled_bytes=0, mem_used=0]
       CoalesceBatchesExec: target_batch_size=8192, metrics=[output_rows=399, 
elapsed_compute=2.681µs, spill_count=0, spilled_bytes=0, mem_used=0]
         HashJoinExec: mode=CollectLeft, join_type=LeftSemi, on=[(Column { 
name: "o_orderkey", index: 2 }, Column { name: "l_orderkey", index: 0 })], 
metrics=[output_rows=456, input_rows=456, input_batches=2, 
build_input_batches=733, build_input_rows=6001215, output_batches=2, 
build_mem_used=806510152, build_time=676.460128ms, join_time=2.88546ms]
           ProjectionExec: expr=[c_custkey@0 as c_custkey, c_name@1 as c_name, 
o_orderkey@2 as o_orderkey, o_totalprice@3 as o_totalprice, o_orderdate@4 as 
o_orderdate, l_quantity@6 as l_quantity], metrics=[output_rows=6001215, 
elapsed_compute=116.294µs, spill_count=0, spilled_bytes=0, mem_used=0]
             CoalesceBatchesExec: target_batch_size=8192, 
metrics=[output_rows=6001215, elapsed_compute=50.883µs, spill_count=0, 
spilled_bytes=0, mem_used=0]
               HashJoinExec: mode=CollectLeft, join_type=Inner, on=[(Column { 
name: "o_orderkey", index: 2 }, Column { name: "l_orderkey", index: 0 })], 
metrics=[output_rows=6001215, input_rows=6001215, input_batches=733, 
build_input_batches=184, build_input_rows=1500000, output_batches=733, 
build_mem_used=177373616, build_time=154.423989ms, join_time=268.907576ms]
                 ProjectionExec: expr=[c_custkey@0 as c_custkey, c_name@1 as 
c_name, o_orderkey@2 as o_orderkey, o_totalprice@4 as o_totalprice, 
o_orderdate@5 as o_orderdate], metrics=[output_rows=1500000, 
elapsed_compute=34.671µs, spill_count=0, spilled_bytes=0, mem_used=0]
                   CoalesceBatchesExec: target_batch_size=8192, 
metrics=[output_rows=1500000, elapsed_compute=14.97µs, spill_count=0, 
spilled_bytes=0, mem_used=0]
                     HashJoinExec: mode=CollectLeft, join_type=Inner, 
on=[(Column { name: "c_custkey", index: 0 }, Column { name: "o_custkey", index: 
1 })], metrics=[output_rows=1500000, input_rows=1500000, input_batches=184, 
build_input_batches=19, build_input_rows=150000, output_batches=184, 
build_mem_used=14652720, build_time=7.173373ms, join_time=64.614167ms]
                       ParquetExec: limit=None, partitions={1 group: 
[[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/customer/part-0.parquet]]},
 projection=[c_custkey, c_name], metrics=[output_rows=150000, 
elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, 
pushdown_rows_filtered=0, num_predicate_creation_errors=0, 
predicate_evaluation_errors=0, row_groups_pruned=0, bytes_scanned=566600, 
page_index_rows_filtered=0, pushdown_eval_time=2ns, 
time_elapsed_scanning_total=5.384791ms, time_elapsed_processing=5.024293ms, 
time_elapsed_scanning_until_data=2.83875ms, time_elapsed_opening=787.167µs, 
page_index_eval_time=2ns]
                       ParquetExec: limit=None, partitions={1 group: 
[[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/orders/part-0.parquet]]},
 projection=[o_orderkey, o_custkey, o_totalprice, o_orderdate], 
metrics=[output_rows=1500000, elapsed_compute=1ns, spill_count=0, 
spilled_bytes=0, mem_used=0, pushdown_rows_filtered=0, 
num_predicate_creation_errors=0, predicate_evaluation_errors=0, 
row_groups_pruned=0, bytes_scanned=13916402, page_index_rows_filtered=0, 
pushdown_eval_time=2ns, time_elapsed_scanning_total=112.261704ms, 
time_elapsed_processing=45.363041ms, 
time_elapsed_scanning_until_data=6.063625ms, time_elapsed_opening=549.5µs, 
page_index_eval_time=2ns]
                 ParquetExec: limit=None, partitions={1 group: 
[[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/lineitem/part-0.parquet]]},
 projection=[l_orderkey, l_quantity], metrics=[output_rows=6001215, 
elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, 
pushdown_rows_filtered=0, num_predicate_creation_errors=0, 
predicate_evaluation_errors=0, row_groups_pruned=0, bytes_scanned=12170874, 
page_index_rows_filtered=0, pushdown_eval_time=2ns, 
time_elapsed_scanning_total=324.056333ms, time_elapsed_processing=50.1033ms, 
time_elapsed_scanning_until_data=2.464666ms, time_elapsed_opening=2.0565ms, 
page_index_eval_time=2ns]
           ProjectionExec: expr=[l_orderkey@0 as l_orderkey], 
metrics=[output_rows=57, elapsed_compute=458ns, spill_count=0, spilled_bytes=0, 
mem_used=0]
             CoalesceBatchesExec: target_batch_size=8192, 
metrics=[output_rows=57, elapsed_compute=19.342µs, spill_count=0, 
spilled_bytes=0, mem_used=0]
               FilterExec: SUM(lineitem.l_quantity)@1 > Some(30000),25,2, 
metrics=[output_rows=57, elapsed_compute=1.093341ms, spill_count=0, 
spilled_bytes=0, mem_used=0]
                 AggregateExec: mode=Single, gby=[l_orderkey@0 as l_orderkey], 
aggr=[SUM(lineitem.l_quantity)], metrics=[output_rows=1500000, 
elapsed_compute=434.341183ms, spill_count=0, spilled_bytes=0, mem_used=0]
                   ParquetExec: limit=None, partitions={1 group: 
[[Users/mingmwang/gitrepo/apache/arrow-datafusion/benchmarks/parquet_data/lineitem/part-0.parquet]]},
 projection=[l_orderkey, l_quantity], metrics=[output_rows=6001215, 
elapsed_compute=1ns, spill_count=0, spilled_bytes=0, mem_used=0, 
pushdown_rows_filtered=0, num_predicate_creation_errors=0, 
predicate_evaluation_errors=0, row_groups_pruned=0, bytes_scanned=12170874, 
page_index_rows_filtered=0, pushdown_eval_time=2ns, 
time_elapsed_scanning_total=443.876134ms, time_elapsed_processing=49.172503ms, 
time_elapsed_scanning_until_data=2.226959ms, time_elapsed_opening=844.959µs, 
page_index_eval_time=2ns]
   
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to