kevinjqliu opened a new pull request, #3126:
URL: https://github.com/apache/iceberg-python/pull/3126
<!--
Thanks for opening a pull request!
-->
<!-- In the case this PR will resolve an issue, please replace
${GITHUB_ISSUE_ID} below with the actual Github issue id. -->
<!-- Closes #${GITHUB_ISSUE_ID} -->
# Rationale for this change
```
➜ uv run pytest tests/benchmark/test_arrow_scan_benchmark.py -m benchmark -s
======================================================================= test
session starts
=======================================================================
platform darwin -- Python 3.10.19, pytest-9.0.2, pluggy-1.6.0
rootdir: /Users/kevinliu/repos/iceberg-python
configfile: pyproject.toml
plugins: mock-3.15.1, anyio-4.11.0, lazy-fixtures-1.4.0, checkdocs-2.14.0,
requests-mock-1.12.1
collected 1 item
tests/benchmark/test_arrow_scan_benchmark.py
--- ArrowScan.to_record_batches Benchmark (Comparison) ---
runs_per_shape=10, warmup_runs_per_shape=2, sleep_between_scenarios_sec=0.5,
files=32, target_file_size_mb=50 (memory only: arr_mb, rss_delta_mb)
| implementation | worker_setting | num_files |
file_size_mb_avg | total_rows | total_batches | full_scan_time_ms_avg |
full_scan_time_ms_max | arrow_peak_mb_avg | rss_peak_delta_mb_avg |
arrow_peak_mb_max | rss_peak_delta_mb_max |
| -------------------------------------- | -------------- | --------- |
---------------- | ---------- | ------------- | --------------------- |
--------------------- | ----------------- | --------------------- |
----------------- | --------------------- |
| baseline (fully materialize all tasks) | 1 | 32 |
49.10 | 4324000 | 288 | 132.65 | 163.01
| 49.11 | 0.03 | 49.11 |
0.11 |
| bounded_queue | 1 | 32 |
49.10 | 4324000 | 288 | 142.43 | 154.39
| 100.19 | 0.00 | 103.92 |
0.03 |
| lazy | 1 | 32 |
49.10 | 4324000 | 288 | 131.48 | 148.50
| 101.38 | 0.00 | 103.85 |
0.00 |
| lazy_warmup | 1 | 32 |
49.10 | 4324000 | 288 | 125.48 | 158.24
| 106.02 | 0.00 | 134.75 |
0.00 |
| baseline (fully materialize all tasks) | 2 | 32 |
49.10 | 4324000 | 288 | 87.65 | 92.75
| 135.00 | 0.04 | 159.81 |
0.12 |
| bounded_queue | 2 | 32 |
49.10 | 4324000 | 288 | 97.39 | 105.58
| 201.49 | 0.24 | 204.91 |
1.52 |
| lazy | 2 | 32 |
49.10 | 4324000 | 288 | 126.66 | 131.47
| 100.88 | 0.00 | 102.35 |
0.00 |
| lazy_warmup | 2 | 32 |
49.10 | 4324000 | 288 | 79.60 | 83.19
| 213.08 | 0.36 | 244.99 |
3.56 |
| baseline (fully materialize all tasks) | 4 | 32 |
49.10 | 4324000 | 288 | 66.89 | 81.48
| 308.17 | 0.05 | 343.86 |
0.27 |
| bounded_queue | 4 | 32 |
49.10 | 4324000 | 288 | 73.09 | 78.14
| 394.04 | 0.01 | 401.54 |
0.06 |
| lazy | 4 | 32 |
49.10 | 4324000 | 288 | 127.57 | 132.25
| 103.22 | 0.00 | 109.17 |
0.00 |
| lazy_warmup | 4 | 32 |
49.10 | 4324000 | 288 | 62.09 | 82.48
| 504.49 | 0.53 | 582.62 |
2.30 |
| baseline (fully materialize all tasks) | 8 | 32 |
49.10 | 4324000 | 288 | 61.22 | 63.91
| 699.60 | 12.08 | 826.30 |
37.50 |
| bounded_queue | 8 | 32 |
49.10 | 4324000 | 288 | 66.69 | 73.07
| 752.00 | 0.60 | 787.62 |
3.34 |
| lazy | 8 | 32 |
49.10 | 4324000 | 288 | 125.74 | 127.10
| 101.36 | 0.00 | 106.66 |
0.05 |
| lazy_warmup | 8 | 32 |
49.10 | 4324000 | 288 | 58.10 | 60.26
| 1991.14 | 1.85 | 2429.90 |
9.08 |
| baseline (fully materialize all tasks) | 16 | 32 |
49.10 | 4324000 | 288 | 60.33 | 62.30
| 1585.96 | 2.29 | 1715.55 |
7.94 |
| bounded_queue | 16 | 32 |
49.10 | 4324000 | 288 | 66.26 | 77.49
| 1335.69 | 1.31 | 1482.20 |
10.48 |
| lazy | 16 | 32 |
49.10 | 4324000 | 288 | 128.26 | 133.57
| 100.75 | 0.00 | 102.29 |
0.00 |
| lazy_warmup | 16 | 32 |
49.10 | 4324000 | 288 | 57.81 | 60.90
| 2763.34 | 2.22 | 3079.33 |
9.12 |
| baseline (fully materialize all tasks) | default (18) | 32 |
49.10 | 4324000 | 288 | 63.72 | 72.10
| 1680.22 | 54.69 | 1822.33 |
177.28 |
| bounded_queue | default (18) | 32 |
49.10 | 4324000 | 288 | 64.19 | 69.11
| 1506.08 | 3.60 | 1683.01 |
13.53 |
| lazy | default (18) | 32 |
49.10 | 4324000 | 288 | 138.37 | 180.34
| 102.41 | 0.00 | 106.72 |
0.00 |
| lazy_warmup | default (18) | 32 |
49.10 | 4324000 | 288 | 59.35 | 66.66
| 2823.83 | 7.30 | 3105.95 |
36.11 |
| baseline (fully materialize all tasks) | 32 | 32 |
49.10 | 4324000 | 288 | 70.89 | 102.28
| 2099.31 | 88.90 | 2454.51 |
260.28 |
| bounded_queue | 32 | 32 |
49.10 | 4324000 | 288 | 63.24 | 66.65
| 2276.13 | 9.70 | 2850.23 |
48.03 |
| lazy | 32 | 32 |
49.10 | 4324000 | 288 | 128.86 | 138.36
| 102.16 | 0.01 | 106.72 |
0.12 |
| lazy_warmup | 32 | 32 |
49.10 | 4324000 | 288 | 60.45 | 73.04
| 2846.71 | 11.87 | 3030.61 |
55.73 |
saved graph: tests/benchmark/artifacts/arrow_scan_benchmark_relationships.png
```
<img width="2240" height="800" alt="arrow_scan_benchmark_relationships"
src="https://github.com/user-attachments/assets/e41e9293-a27a-450e-9640-a6e27fb001aa"
/>
## Are these changes tested?
## Are there any user-facing changes?
<!-- In the case of user-facing changes, please add the changelog label. -->
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]