ariel-miculas commented on issue #22526: URL: https://github.com/apache/datafusion/issues/22526#issuecomment-4568611822
The spilling issue occurs when there's an operator which accumulates the
input partitions, such as `SortExec`.
This can be achieved by removing `limit 10` from q34:
```
diff --git a/benchmarks/queries/clickbench/queries/q34.sql
b/benchmarks/queries/clickbench/queries/q34.sql
index fdb7edbb6..864294d7e 100644
--- a/benchmarks/queries/clickbench/queries/q34.sql
+++ b/benchmarks/queries/clickbench/queries/q34.sql
@@ -1,4 +1,4 @@
-- Must set for ClickBench hits_partitioned dataset. See
https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true
-SELECT 1, "URL", COUNT(*) AS c FROM hits GROUP BY 1, "URL" ORDER BY c DESC
LIMIT 10;
+SELECT 1, "URL", COUNT(*) AS c FROM hits GROUP BY 1, "URL" ORDER BY c DESC;
```
As shown below, there are 735 spill counts for the SortExec operator, even
though the memory limit for the execution was set to 50G.
> spill_count=735, spilled_bytes=3.7 GB, spilled_rows=18.34 M
```
$ cargo run --profile=release-nonlto --bin dfbench -- clickbench \
--path benchmarks/data/hits_partitioned \
--queries-path benchmarks/queries/clickbench/queries \
--query 34 \
-i 1 \
--debug \
--memory-limit 50G 2>&1
Finished `release-nonlto` profile [optimized] target(s) in 0.18s
Running `target/release-nonlto/dfbench clickbench --path
benchmarks/data/hits_partitioned --queries-path
benchmarks/queries/clickbench/queries --query 34 -i 1 --debug --memory-limit
50G`
Running benchmarks with the following options: RunOpt { query: Some(34),
pushdown: false, common: CommonOpt { iterations: 1, partitions: None,
batch_size: None, mem_pool_type: "fair", memory_limit: Some(53687091200),
sort_spill_reservation_bytes: None, debug: true, simulate_latency: false },
path: "benchmarks/data/hits_partitioned", queries_path:
"benchmarks/queries/clickbench/queries", output_path: None, sorted_by: None,
sort_order: "ASC", config_options: [] }
Q34: -- Must set for ClickBench hits_partitioned dataset. See
https://github.com/apache/datafusion/issues/16591
-- set datafusion.execution.parquet.binary_as_string = true
SELECT 1, "URL", COUNT(*) AS c FROM hits GROUP BY 1, "URL" ORDER BY c DESC;
Query 34 iteration 0 took 2639.3 ms and returned 18342019 rows
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| plan_type | plan
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Plan with Metrics | SortPreservingMergeExec: [c@2 DESC],
metrics=[output_rows=18.34 M, elapsed_compute=247.86ms, output_bytes=69.3 GB,
output_batches=2.24 K]
|
| | SortExec: expr=[c@2 DESC],
preserve_partitioning=[true], metrics=[output_rows=18.34 M,
elapsed_compute=632.17ms, output_bytes=25.4 GB, output_batches=2.24 K,
spill_count=735, spilled_bytes=3.7 GB, spilled_rows=18.34 M]
|
| | ProjectionExec: expr=[1 as Int64(1), URL@0 as URL,
count(Int64(1))@1 as c], metrics=[output_rows=18.34 M, elapsed_compute=8.60ms,
output_bytes=611.9 GB, output_batches=2.24 K, expr_0_eval_time=7.03ms,
expr_1_eval_time=194.37µs, expr_2_eval_time=90.21µs]
|
| | AggregateExec: mode=FinalPartitioned, gby=[URL@0
as URL], aggr=[count(Int64(1))], metrics=[output_rows=18.34 M,
elapsed_compute=2.67s, output_bytes=611.8 GB, output_batches=2.24 K,
spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, peak_mem_used=9.47 B,
aggregate_arguments_time=1.05ms, aggregation_time=82.06ms,
emitting_time=51.50µs, time_calculating_group_ids=2.58s]
|
| | RepartitionExec: partitioning=Hash([URL@0],
14), input_partitions=14, metrics=[output_rows=22.75 M, elapsed_compute=2.02ms,
output_bytes=5.5 GB, output_batches=2.79 K, spill_count=0, spilled_bytes=0.0 B,
spilled_rows=0, fetch_time=10.42s, repartition_time=364.31ms, send_time=2.04s]
|
| | AggregateExec: mode=Partial, gby=[URL@0 as
URL], aggr=[count(Int64(1))], metrics=[output_rows=22.75 M,
elapsed_compute=5.76s, output_bytes=827.3 GB, output_batches=2.78 K,
spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, skipped_aggregation_rows=0,
peak_mem_used=5.39 B, aggregate_arguments_time=21.97ms,
aggregation_time=189.26ms, emitting_time=78.41µs,
time_calculating_group_ids=5.54s, reduction_factor=22.75% (22.75 M/100.00 M)]
|
| | DataSourceExec: file_groups={14 groups:
[[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_0.parquet:0..122446530,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_1.parquet:0..174965044,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_10.parquet:0..101513258,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_11.parquet:0..118419888,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_12.parquet:0..149514164,
...],
[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_15.parquet:88577877..103098894,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_16.parquet:0..101067219,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_17.parquet:0..116867853,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_18.parquet:0..133119589,
Users/amiculas/work/datafusion/benchmarks/data/hits_partition
ed/hits_19.parquet:0..103692598, ...],
[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_23.parquet:73829085..79631107,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_24.parquet:0..78257049,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_25.parquet:0..144169728,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_26.parquet:0..156510916,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_27.parquet:0..166286210,
...],
[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_30.parquet:67171810..124187913,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_31.parquet:0..123065410,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_32.parquet:0..94506004,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_33.parquet:0..78243765,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_34.parquet:0..119426616,
...],
[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_39.parquet:94059938..103522954,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_4.parquet:0..140929275,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_40.parquet:0..142508647,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_41.parquet:0..290614269,
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_42.parquet:0..288524057,
...], ...]}, projection=[URL], file_type=parquet, metrics=[output_rows=100.00
M, elapsed_compute=3.56ms, output_bytes=19.9 GB, output_batches=12.31 K,
files_ranges_pruned_statistics=113 total → 113 matched,
row_groups_pruned_statistics=325 total → 325 matched,
row_groups_pruned_bloom_filter=325 total → 325 matched,
page_index_pages_pruned=0 total → 0 matched, page_index_rows_pruned=0 total → 0
matched, limit_pruned_row_groups=0 total → 0 matched, batches_split=0,
bytes_scanned=2.62 B, file_open_er
rors=0, file_scan_errors=0, files_opened=113, files_processed=113,
num_predicate_creation_errors=0, predicate_evaluation_errors=0,
pushdown_rows_matched=0, pushdown_rows_pruned=0,
predicate_cache_inner_records=0, predicate_cache_records=0,
bloom_filter_eval_time=3.89µs, metadata_load_time=1.92ms,
page_index_eval_time=226ns, row_pushdown_eval_time=226ns,
statistics_eval_time=226ns, time_elapsed_opening=9.13ms,
time_elapsed_processing=4.09s, time_elapsed_scanning_total=10.39s,
time_elapsed_scanning_until_data=260.32ms, output_rows_skew=2.95%,
scan_efficiency_ratio=1637.04% (2.62 B/160.3 M)] |
| |
|
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
Query 34 avg time: 2639.29 ms
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
