ariel-miculas commented on issue #22526:
URL: https://github.com/apache/datafusion/issues/22526#issuecomment-4568611822

   The spilling issue occurs when there's an operator which accumulates the 
input partitions, such as `SortExec`.
   This can be achieved by removing `limit 10` from q34:
   ```
   diff --git a/benchmarks/queries/clickbench/queries/q34.sql 
b/benchmarks/queries/clickbench/queries/q34.sql
   index fdb7edbb6..864294d7e 100644
   --- a/benchmarks/queries/clickbench/queries/q34.sql
   +++ b/benchmarks/queries/clickbench/queries/q34.sql
   @@ -1,4 +1,4 @@
    -- Must set for ClickBench hits_partitioned dataset. See 
https://github.com/apache/datafusion/issues/16591
    -- set datafusion.execution.parquet.binary_as_string = true
   
   -SELECT 1, "URL", COUNT(*) AS c FROM hits GROUP BY 1, "URL" ORDER BY c DESC 
LIMIT 10;
   +SELECT 1, "URL", COUNT(*) AS c FROM hits GROUP BY 1, "URL" ORDER BY c DESC;
   
   ```
   
   As shown below, there are 735 spill counts for the SortExec operator, even 
though the memory limit for the execution was set to 50G.
   >  spill_count=735, spilled_bytes=3.7 GB, spilled_rows=18.34 M
   
   ```
   $ cargo run --profile=release-nonlto --bin dfbench -- clickbench \
           --path benchmarks/data/hits_partitioned \
           --queries-path benchmarks/queries/clickbench/queries \
           --query 34 \
           -i 1 \
           --debug \
           --memory-limit 50G 2>&1
   
       Finished `release-nonlto` profile [optimized] target(s) in 0.18s
        Running `target/release-nonlto/dfbench clickbench --path 
benchmarks/data/hits_partitioned --queries-path 
benchmarks/queries/clickbench/queries --query 34 -i 1 --debug --memory-limit 
50G`
   Running benchmarks with the following options: RunOpt { query: Some(34), 
pushdown: false, common: CommonOpt { iterations: 1, partitions: None, 
batch_size: None, mem_pool_type: "fair", memory_limit: Some(53687091200), 
sort_spill_reservation_bytes: None, debug: true, simulate_latency: false }, 
path: "benchmarks/data/hits_partitioned", queries_path: 
"benchmarks/queries/clickbench/queries", output_path: None, sorted_by: None, 
sort_order: "ASC", config_options: [] }
   Q34: -- Must set for ClickBench hits_partitioned dataset. See 
https://github.com/apache/datafusion/issues/16591
   -- set datafusion.execution.parquet.binary_as_string = true
   
   SELECT 1, "URL", COUNT(*) AS c FROM hits GROUP BY 1, "URL" ORDER BY c DESC;
   
   Query 34 iteration 0 took 2639.3 ms and returned 18342019 rows
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | plan_type         | plan                                                   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   | Plan with Metrics | SortPreservingMergeExec: [c@2 DESC], 
metrics=[output_rows=18.34 M, elapsed_compute=247.86ms, output_bytes=69.3 GB, 
output_batches=2.24 K]                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                          
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
   |                   |   SortExec: expr=[c@2 DESC], 
preserve_partitioning=[true], metrics=[output_rows=18.34 M, 
elapsed_compute=632.17ms, output_bytes=25.4 GB, output_batches=2.24 K, 
spill_count=735, spilled_bytes=3.7 GB, spilled_rows=18.34 M]                    
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
             
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
   |                   |     ProjectionExec: expr=[1 as Int64(1), URL@0 as URL, 
count(Int64(1))@1 as c], metrics=[output_rows=18.34 M, elapsed_compute=8.60ms, 
output_bytes=611.9 GB, output_batches=2.24 K, expr_0_eval_time=7.03ms, 
expr_1_eval_time=194.37µs, expr_2_eval_time=90.21µs]                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                              
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                           |
   |                   |       AggregateExec: mode=FinalPartitioned, gby=[URL@0 
as URL], aggr=[count(Int64(1))], metrics=[output_rows=18.34 M, 
elapsed_compute=2.67s, output_bytes=611.8 GB, output_batches=2.24 K, 
spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, peak_mem_used=9.47 B, 
aggregate_arguments_time=1.05ms, aggregation_time=82.06ms, 
emitting_time=51.50µs, time_calculating_group_ids=2.58s]                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                          |
   |                   |         RepartitionExec: partitioning=Hash([URL@0], 
14), input_partitions=14, metrics=[output_rows=22.75 M, elapsed_compute=2.02ms, 
output_bytes=5.5 GB, output_batches=2.79 K, spill_count=0, spilled_bytes=0.0 B, 
spilled_rows=0, fetch_time=10.42s, repartition_time=364.31ms, send_time=2.04s]  
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                         
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
   |                   |           AggregateExec: mode=Partial, gby=[URL@0 as 
URL], aggr=[count(Int64(1))], metrics=[output_rows=22.75 M, 
elapsed_compute=5.76s, output_bytes=827.3 GB, output_batches=2.78 K, 
spill_count=0, spilled_bytes=0.0 B, spilled_rows=0, skipped_aggregation_rows=0, 
peak_mem_used=5.39 B, aggregate_arguments_time=21.97ms, 
aggregation_time=189.26ms, emitting_time=78.41µs, 
time_calculating_group_ids=5.54s, reduction_factor=22.75% (22.75 M/100.00 M)]   
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                            
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                          |
   |                   |             DataSourceExec: file_groups={14 groups: 
[[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_0.parquet:0..122446530,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_1.parquet:0..174965044,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_10.parquet:0..101513258,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_11.parquet:0..118419888,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_12.parquet:0..149514164,
 ...], 
[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_15.parquet:88577877..103098894,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_16.parquet:0..101067219,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_17.parquet:0..116867853,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_18.parquet:0..133119589,
 Users/amiculas/work/datafusion/benchmarks/data/hits_partition
 ed/hits_19.parquet:0..103692598, ...], 
[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_23.parquet:73829085..79631107,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_24.parquet:0..78257049,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_25.parquet:0..144169728,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_26.parquet:0..156510916,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_27.parquet:0..166286210,
 ...], 
[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_30.parquet:67171810..124187913,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_31.parquet:0..123065410,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_32.parquet:0..94506004,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_33.parquet:0..78243765,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_34.parquet:0..119426616,
 
 ...], 
[Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_39.parquet:94059938..103522954,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_4.parquet:0..140929275,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_40.parquet:0..142508647,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_41.parquet:0..290614269,
 
Users/amiculas/work/datafusion/benchmarks/data/hits_partitioned/hits_42.parquet:0..288524057,
 ...], ...]}, projection=[URL], file_type=parquet, metrics=[output_rows=100.00 
M, elapsed_compute=3.56ms, output_bytes=19.9 GB, output_batches=12.31 K, 
files_ranges_pruned_statistics=113 total → 113 matched, 
row_groups_pruned_statistics=325 total → 325 matched, 
row_groups_pruned_bloom_filter=325 total → 325 matched, 
page_index_pages_pruned=0 total → 0 matched, page_index_rows_pruned=0 total → 0 
matched, limit_pruned_row_groups=0 total → 0 matched, batches_split=0, 
bytes_scanned=2.62 B, file_open_er
 rors=0, file_scan_errors=0, files_opened=113, files_processed=113, 
num_predicate_creation_errors=0, predicate_evaluation_errors=0, 
pushdown_rows_matched=0, pushdown_rows_pruned=0, 
predicate_cache_inner_records=0, predicate_cache_records=0, 
bloom_filter_eval_time=3.89µs, metadata_load_time=1.92ms, 
page_index_eval_time=226ns, row_pushdown_eval_time=226ns, 
statistics_eval_time=226ns, time_elapsed_opening=9.13ms, 
time_elapsed_processing=4.09s, time_elapsed_scanning_total=10.39s, 
time_elapsed_scanning_until_data=260.32ms, output_rows_skew=2.95%, 
scan_efficiency_ratio=1637.04% (2.62 B/160.3 M)] |
   |                   |                                                        
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                      
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                                                                                
                         |
   
+-------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
   Query 34 avg time: 2639.29 ms
   ``` 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]


Reply via email to