Grzegorz Liter created FLINK-38212: -------------------------------------- Summary: OOM during savepoint caused by potential memory leak issue in RocksDB related to jemalloc Key: FLINK-38212 URL: https://issues.apache.org/jira/browse/FLINK-38212 Project: Flink Issue Type: Bug Components: Runtime / Checkpointing Affects Versions: 1.20.2, 2.1.0 Environment: Flink 2.1.0 running in Application mode with Flink Operator 1.12.1.
Memory and savepoint related settings: {code:java} env.java.opts.taskmanager: ' -XX:+UnlockExperimentalVMOptions -XX:+UseStringDeduplication -XX:+AlwaysPreTouch -XX:G1HeapRegionSize=16m -Xlog:gc*:file=/tmp/gc.log:time,uptime,level,tags -XX:SurvivorRatio=6 -XX:G1NewSizePercent=40 execution.checkpointing.max-concurrent-checkpoints: "1" execution.checkpointing.snapshot-compression: "true" fs.s3a.aws.credentials.provider: com.amazonaws.auth.WebIdentityTokenCredentialsProvider fs.s3a.block.size: fs.s3a.experimental.input.fadvise: sequential fs.s3a.path.style.access: "true" state.backend.incremental: "true" state.backend.type: rocksdb state.checkpoints.dir: s3p://bucket/checkpoints state.savepoints.dir: s3p://bucket/savepoints taskmanager.memory.jvm-overhead.fraction: "0.1" taskmanager.memory.jvm-overhead.max: 6g taskmanager.memory.managed.fraction: "0.4" taskmanager.memory.network.fraction: "0.05" taskmanager.network.memory.buffer-debloat.enabled: "true" taskmanager.numberOfTaskSlots: "12" ... resource: memory: 16g{code} Reporter: Grzegorz Liter I am running a job with snapshot size about ~17 GB with compression enabled. I have observed that savepoints often fails due to TM getting killed by Kubernetes due to exceeding memory limit on pod that had 30 GB of memory limit assigned. Flink metrics nor detailed VM metrics taken with `jcmd <PID> VM.native_memory detail` does not indicate any unusual memory increase. Consumed memory is visible only in Kubernetes metrics and RSS. When enough memory set (+ potentially setting enough jvm overhead) to leave some breathing room one snapshot could be taken but taking subsequent full snapshots reliably leads to OOM. This documentation: [switching-the-memory-allocator|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#switching-the-memory-allocator] have lead me to trying {code:java} MALLOC_ARENA_MAX=1 DISABLE_JEMALLOC=true {code} This configuration helped to make savepoint reliably pass without OOM. I have trying setting only one of each options at once but that was not fixing the issue. I also tried downscaling pod down to 16 GB of memory and with these options savepoint was reliably created without any issue. Without them every savepoint fails. -- This message was sent by Atlassian Jira (v8.20.10#820010)