andygrove opened a new issue, #3821:
URL: https://github.com/apache/datafusion-comet/issues/3821
## Description
When running the shuffle benchmark with a configured memory pool limit, the
actual peak RSS (resident set size) significantly exceeds the configured limit.
This suggests there are significant memory allocations happening outside of the
memory pool accounting.
## Benchmark Data
**Setup:** TPCH SF100 lineitem (100M rows, 16 columns), hash partitioning
(200 partitions), lz4 compression, single iteration.
| Memory Limit | Peak RSS | RSS / Limit | Write Time | Throughput |
|---|---|---|---|---|
| 2 GB | 3.8 GB | 1.78x | 43.4s | 2.30M rows/s |
| 4 GB | 6.7 GB | 1.56x | 45.1s | 2.22M rows/s |
| 8 GB | 11.0 GB | 1.28x | 46.4s | 2.15M rows/s |
| 16 GB | 13.1 GB | 0.76x | 51.9s | 1.93M rows/s |
## Observations
1. **Peak RSS exceeds the memory limit by up to 1.78x.** At a 2 GB memory
limit, the process uses 3.8 GB of RSS. This means nearly half the memory usage
is untracked by the memory pool.
2. **Overhead outside the pool grows with the limit.** Subtracting the
memory limit from peak RSS: 2g has ~1.8 GB overhead, 4g has ~2.7 GB, 8g has
~3.0 GB. This suggests some internal structures scale with available memory
rather than being fixed-size.
3. **Higher memory limits are slower, not faster.** The 2 GB run (43.4s) is
20% faster than the 16 GB run (51.9s). Smaller pools may keep the working set
more cache-friendly, while larger buffers may cause more memory pressure at
flush time.
## Expected Behavior
The memory pool limit should more closely bound the total process memory
usage. Untracked allocations (Arrow buffers, Parquet reader state, partition
writer buffers, etc.) should either be accounted for in the pool or documented
as expected overhead.
## Reproduction
```sh
cargo build --release --bin shuffle_bench --features shuffle-bench
/usr/bin/time -l ./native/target/release/shuffle_bench \
--input /opt/tpch/sf100/lineitem \
--codec lz4 \
--partitions 200 \
--hash-columns 0,3 \
--memory-limit 2147483648 \
--limit 100000000
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]