[PR] add benchmark for arrow scan [iceberg-python]

via GitHub Sat, 07 Mar 2026 16:57:10 -0800


kevinjqliu opened a new pull request, #3126:
URL: https://github.com/apache/iceberg-python/pull/3126


   <!--
   Thanks for opening a pull request!
   -->
   
   <!-- In the case this PR will resolve an issue, please replace 
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
   <!-- Closes #${GITHUB_ISSUE_ID} -->
   
   # Rationale for this change
   
   ```
   ➜ uv run pytest tests/benchmark/test_arrow_scan_benchmark.py -m benchmark -s 
   ======================================================================= test 
session starts 
=======================================================================
   platform darwin -- Python 3.10.19, pytest-9.0.2, pluggy-1.6.0
   rootdir: /Users/kevinliu/repos/iceberg-python
   configfile: pyproject.toml
   plugins: mock-3.15.1, anyio-4.11.0, lazy-fixtures-1.4.0, checkdocs-2.14.0, 
requests-mock-1.12.1
   collected 1 item                                                             
                                                                                
     
   
   tests/benchmark/test_arrow_scan_benchmark.py 
   --- ArrowScan.to_record_batches Benchmark (Comparison) ---
   runs_per_shape=10, warmup_runs_per_shape=2, sleep_between_scenarios_sec=0.5, 
files=32, target_file_size_mb=50 (memory only: arr_mb, rss_delta_mb)
   | implementation                         | worker_setting | num_files | 
file_size_mb_avg | total_rows | total_batches | full_scan_time_ms_avg | 
full_scan_time_ms_max | arrow_peak_mb_avg | rss_peak_delta_mb_avg | 
arrow_peak_mb_max | rss_peak_delta_mb_max |
   | -------------------------------------- | -------------- | --------- | 
---------------- | ---------- | ------------- | --------------------- | 
--------------------- | ----------------- | --------------------- | 
----------------- | --------------------- |
   | baseline (fully materialize all tasks) | 1              | 32        | 
49.10            | 4324000    | 288           | 132.65                | 163.01  
              | 49.11             | 0.03                  | 49.11             | 
0.11                  |
   | bounded_queue                          | 1              | 32        | 
49.10            | 4324000    | 288           | 142.43                | 154.39  
              | 100.19            | 0.00                  | 103.92            | 
0.03                  |
   | lazy                                   | 1              | 32        | 
49.10            | 4324000    | 288           | 131.48                | 148.50  
              | 101.38            | 0.00                  | 103.85            | 
0.00                  |
   | lazy_warmup                            | 1              | 32        | 
49.10            | 4324000    | 288           | 125.48                | 158.24  
              | 106.02            | 0.00                  | 134.75            | 
0.00                  |
   | baseline (fully materialize all tasks) | 2              | 32        | 
49.10            | 4324000    | 288           | 87.65                 | 92.75   
              | 135.00            | 0.04                  | 159.81            | 
0.12                  |
   | bounded_queue                          | 2              | 32        | 
49.10            | 4324000    | 288           | 97.39                 | 105.58  
              | 201.49            | 0.24                  | 204.91            | 
1.52                  |
   | lazy                                   | 2              | 32        | 
49.10            | 4324000    | 288           | 126.66                | 131.47  
              | 100.88            | 0.00                  | 102.35            | 
0.00                  |
   | lazy_warmup                            | 2              | 32        | 
49.10            | 4324000    | 288           | 79.60                 | 83.19   
              | 213.08            | 0.36                  | 244.99            | 
3.56                  |
   | baseline (fully materialize all tasks) | 4              | 32        | 
49.10            | 4324000    | 288           | 66.89                 | 81.48   
              | 308.17            | 0.05                  | 343.86            | 
0.27                  |
   | bounded_queue                          | 4              | 32        | 
49.10            | 4324000    | 288           | 73.09                 | 78.14   
              | 394.04            | 0.01                  | 401.54            | 
0.06                  |
   | lazy                                   | 4              | 32        | 
49.10            | 4324000    | 288           | 127.57                | 132.25  
              | 103.22            | 0.00                  | 109.17            | 
0.00                  |
   | lazy_warmup                            | 4              | 32        | 
49.10            | 4324000    | 288           | 62.09                 | 82.48   
              | 504.49            | 0.53                  | 582.62            | 
2.30                  |
   | baseline (fully materialize all tasks) | 8              | 32        | 
49.10            | 4324000    | 288           | 61.22                 | 63.91   
              | 699.60            | 12.08                 | 826.30            | 
37.50                 |
   | bounded_queue                          | 8              | 32        | 
49.10            | 4324000    | 288           | 66.69                 | 73.07   
              | 752.00            | 0.60                  | 787.62            | 
3.34                  |
   | lazy                                   | 8              | 32        | 
49.10            | 4324000    | 288           | 125.74                | 127.10  
              | 101.36            | 0.00                  | 106.66            | 
0.05                  |
   | lazy_warmup                            | 8              | 32        | 
49.10            | 4324000    | 288           | 58.10                 | 60.26   
              | 1991.14           | 1.85                  | 2429.90           | 
9.08                  |
   | baseline (fully materialize all tasks) | 16             | 32        | 
49.10            | 4324000    | 288           | 60.33                 | 62.30   
              | 1585.96           | 2.29                  | 1715.55           | 
7.94                  |
   | bounded_queue                          | 16             | 32        | 
49.10            | 4324000    | 288           | 66.26                 | 77.49   
              | 1335.69           | 1.31                  | 1482.20           | 
10.48                 |
   | lazy                                   | 16             | 32        | 
49.10            | 4324000    | 288           | 128.26                | 133.57  
              | 100.75            | 0.00                  | 102.29            | 
0.00                  |
   | lazy_warmup                            | 16             | 32        | 
49.10            | 4324000    | 288           | 57.81                 | 60.90   
              | 2763.34           | 2.22                  | 3079.33           | 
9.12                  |
   | baseline (fully materialize all tasks) | default (18)   | 32        | 
49.10            | 4324000    | 288           | 63.72                 | 72.10   
              | 1680.22           | 54.69                 | 1822.33           | 
177.28                |
   | bounded_queue                          | default (18)   | 32        | 
49.10            | 4324000    | 288           | 64.19                 | 69.11   
              | 1506.08           | 3.60                  | 1683.01           | 
13.53                 |
   | lazy                                   | default (18)   | 32        | 
49.10            | 4324000    | 288           | 138.37                | 180.34  
              | 102.41            | 0.00                  | 106.72            | 
0.00                  |
   | lazy_warmup                            | default (18)   | 32        | 
49.10            | 4324000    | 288           | 59.35                 | 66.66   
              | 2823.83           | 7.30                  | 3105.95           | 
36.11                 |
   | baseline (fully materialize all tasks) | 32             | 32        | 
49.10            | 4324000    | 288           | 70.89                 | 102.28  
              | 2099.31           | 88.90                 | 2454.51           | 
260.28                |
   | bounded_queue                          | 32             | 32        | 
49.10            | 4324000    | 288           | 63.24                 | 66.65   
              | 2276.13           | 9.70                  | 2850.23           | 
48.03                 |
   | lazy                                   | 32             | 32        | 
49.10            | 4324000    | 288           | 128.86                | 138.36  
              | 102.16            | 0.01                  | 106.72            | 
0.12                  |
   | lazy_warmup                            | 32             | 32        | 
49.10            | 4324000    | 288           | 60.45                 | 73.04   
              | 2846.71           | 11.87                 | 3030.61           | 
55.73                 |
   saved graph: tests/benchmark/artifacts/arrow_scan_benchmark_relationships.png
   ```
   <img width="2240" height="800" alt="arrow_scan_benchmark_relationships" 
src="https://github.com/user-attachments/assets/e41e9293-a27a-450e-9640-a6e27fb001aa";
 />
   
   ## Are these changes tested?
   
   ## Are there any user-facing changes?
   
   <!-- In the case of user-facing changes, please add the changelog label. -->
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] add benchmark for arrow scan [iceberg-python]

Reply via email to