Grzegorz Liter created FLINK-38212:
--------------------------------------
Summary: OOM during savepoint caused by potential memory leak
issue in RocksDB related to jemalloc
Key: FLINK-38212
URL: https://issues.apache.org/jira/browse/FLINK-38212
Project: Flink
Issue Type: Bug
Components: Runtime / Checkpointing
Affects Versions: 1.20.2, 2.1.0
Environment: Flink 2.1.0 running in Application mode with Flink
Operator 1.12.1.
Memory and savepoint related settings:
{code:java}
env.java.opts.taskmanager: ' -XX:+UnlockExperimentalVMOptions
-XX:+UseStringDeduplication
-XX:+AlwaysPreTouch -XX:G1HeapRegionSize=16m
-Xlog:gc*:file=/tmp/gc.log:time,uptime,level,tags
-XX:SurvivorRatio=6 -XX:G1NewSizePercent=40
execution.checkpointing.max-concurrent-checkpoints: "1"
execution.checkpointing.snapshot-compression: "true"
fs.s3a.aws.credentials.provider:
com.amazonaws.auth.WebIdentityTokenCredentialsProvider
fs.s3a.block.size:
fs.s3a.experimental.input.fadvise: sequential
fs.s3a.path.style.access: "true"
state.backend.incremental: "true"
state.backend.type: rocksdb
state.checkpoints.dir: s3p://bucket/checkpoints
state.savepoints.dir: s3p://bucket/savepoints
taskmanager.memory.jvm-overhead.fraction: "0.1"
taskmanager.memory.jvm-overhead.max: 6g
taskmanager.memory.managed.fraction: "0.4"
taskmanager.memory.network.fraction: "0.05"
taskmanager.network.memory.buffer-debloat.enabled: "true"
taskmanager.numberOfTaskSlots: "12"
...
resource:
memory: 16g{code}
Reporter: Grzegorz Liter
I am running a job with snapshot size about ~17 GB with compression enabled. I
have observed that savepoints often fails due to TM getting killed by
Kubernetes due to exceeding memory limit on pod that had 30 GB of memory limit
assigned.
Flink metrics nor detailed VM metrics taken with `jcmd <PID> VM.native_memory
detail` does not indicate any unusual memory increase. Consumed memory is
visible only in Kubernetes metrics and RSS.
When enough memory set (+ potentially setting enough jvm overhead) to leave
some breathing room one snapshot could be taken but taking subsequent full
snapshots reliably leads to OOM.
This documentation:
[switching-the-memory-allocator|https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/resource-providers/standalone/docker/#switching-the-memory-allocator]
have lead me to trying
{code:java}
MALLOC_ARENA_MAX=1
DISABLE_JEMALLOC=true {code}
This configuration helped to make savepoint reliably pass without OOM. I have
trying setting only one of each options at once but that was not fixing the
issue.
I also tried downscaling pod down to 16 GB of memory and with these options
savepoint was reliably created without any issue. Without them every savepoint
fails.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)