leekeiabstraction opened a new pull request, #266:
URL: https://github.com/apache/flink-docker/pull/266

   ## What changes were proposed in this pull request?
   
   `docker-entrypoint.sh` already loads jemalloc via `LD_PRELOAD` but leaves
   `narenas` at its default of `4 * ncpus`. `ncpus` is read from
   `/proc/cpuinfo`, which reflects the **host** CPU count — not the
   container's CPU limit. On a large host running a CPU-limited container
   this over-provisions arenas, and because each idle arena holds dirty
   pages until `dirty_decay_ms`, anon RSS inflates beyond what jemalloc
   actually needs.
   
   This patch derives `narenas` from the container's cgroup CPU quota:
   
   - cgroup v2 → `/sys/fs/cgroup/cpu.max`
   - cgroup v1 → `/sys/fs/cgroup/cpu/cpu.cfs_quota_us` + `cpu.cfs_period_us`
   - Fallback → `nproc` (handles cpuset-pinned pods and unlimited setups)
   
   It then sets `MALLOC_CONF=narenas:<N>`, deferring to any user-supplied
   `narenas` in `MALLOC_CONF` and appending (rather than overwriting) other
   user-supplied `MALLOC_CONF` values.
   
   ## Why are the changes needed?
   
   `nproc` honors `cpuset` (sched affinity) but **not** CPU quotas, so it
   doesn't help here — Docker `--cpus=N` and Kubernetes CPU limits both
   express themselves as quotas, not cpuset. Reading the cgroup files
   directly is the only reliable signal inside a container.
   
   ## Verifying this change
   
   Reproduced on Docker Desktop with 4 TaskManagers per cluster (2 GB
   process size, 1 CPU each, RocksDB state backend, datagen → temporal
   join → blackhole, 5-minute sample windows, two runs each):
   
   | metric | OSS Flink 2.2.1 (mean of 2 runs) | patched (mean of 2 runs) | Δ |
   |---|---|---|---|
   | peak anon RSS | 1703 MiB | 1487 MiB | **−12.7 %** |
   | avg anon RSS | 1436 MiB | 1279 MiB | **−11.0 %** |
   | source throughput | 212,656 rec | 217,934 rec | +2.5 % |
   
   Memory drop is reproducible across runs; throughput is unaffected (and
   slightly higher).
   
   ## Does this PR introduce any user-facing change?
   
   No new flags. Container log now prints one line at startup, e.g.:
   
   ```
   jemalloc: setting MALLOC_CONF=narenas:4 (detected 1 CPUs)
   ```
   
   Users who set `narenas` in `MALLOC_CONF` themselves see their value
   preserved unchanged.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to