leekeiabstraction opened a new pull request, #266: URL: https://github.com/apache/flink-docker/pull/266
## What changes were proposed in this pull request? `docker-entrypoint.sh` already loads jemalloc via `LD_PRELOAD` but leaves `narenas` at its default of `4 * ncpus`. `ncpus` is read from `/proc/cpuinfo`, which reflects the **host** CPU count — not the container's CPU limit. On a large host running a CPU-limited container this over-provisions arenas, and because each idle arena holds dirty pages until `dirty_decay_ms`, anon RSS inflates beyond what jemalloc actually needs. This patch derives `narenas` from the container's cgroup CPU quota: - cgroup v2 → `/sys/fs/cgroup/cpu.max` - cgroup v1 → `/sys/fs/cgroup/cpu/cpu.cfs_quota_us` + `cpu.cfs_period_us` - Fallback → `nproc` (handles cpuset-pinned pods and unlimited setups) It then sets `MALLOC_CONF=narenas:<N>`, deferring to any user-supplied `narenas` in `MALLOC_CONF` and appending (rather than overwriting) other user-supplied `MALLOC_CONF` values. ## Why are the changes needed? `nproc` honors `cpuset` (sched affinity) but **not** CPU quotas, so it doesn't help here — Docker `--cpus=N` and Kubernetes CPU limits both express themselves as quotas, not cpuset. Reading the cgroup files directly is the only reliable signal inside a container. ## Verifying this change Reproduced on Docker Desktop with 4 TaskManagers per cluster (2 GB process size, 1 CPU each, RocksDB state backend, datagen → temporal join → blackhole, 5-minute sample windows, two runs each): | metric | OSS Flink 2.2.1 (mean of 2 runs) | patched (mean of 2 runs) | Δ | |---|---|---|---| | peak anon RSS | 1703 MiB | 1487 MiB | **−12.7 %** | | avg anon RSS | 1436 MiB | 1279 MiB | **−11.0 %** | | source throughput | 212,656 rec | 217,934 rec | +2.5 % | Memory drop is reproducible across runs; throughput is unaffected (and slightly higher). ## Does this PR introduce any user-facing change? No new flags. Container log now prints one line at startup, e.g.: ``` jemalloc: setting MALLOC_CONF=narenas:4 (detected 1 CPUs) ``` Users who set `narenas` in `MALLOC_CONF` themselves see their value preserved unchanged. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
