[
https://issues.apache.org/jira/browse/FLINK-35489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850635#comment-17850635
]
Rui Fan commented on FLINK-35489:
---------------------------------
Hi [~nfraison.datadog]
{quote}because we are relying on some javaagent performing some memory
allocation outside of the JVM (rely on some C bindings).
{quote}
Do you mean your job need some native memory and you want to increase the
taskmanager.memory.managed.size to reserve some native memory to avoid the TM
total memory usage excesses the memory total limitation?
If yes (IIUC), the flink managed memory[1] is mainly used by RocksDB State
backend in streaming jobs. You can increase the Flink JVM overhead memory if
the native memory is not enough. You can check the Flink TM memory model
here[2].
It means you can try to increase these options:
* taskmanager.memory.jvm-overhead.min [3]
* taskmanager.memory.jvm-overhead.max [4]
* taskmanager.memory.jvm-overhead.fraction [5]
Feel free to correct me if anything is wrong, thanks.:)
[1][https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_setup_tm/#managed-memory]
[2][https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/memory/mem_setup_tm/#detailed-memory-model]
[3]https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#taskmanager-memory-jvm-overhead-min
[4]
https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#taskmanager-memory-jvm-overhead-max
[5]https://nightlies.apache.org/flink/flink-docs-master/docs/deployment/config/#taskmanager-memory-jvm-overhead-fraction
> Add capability to set min taskmanager.memory.managed.size when enabling
> autotuning
> ----------------------------------------------------------------------------------
>
> Key: FLINK-35489
> URL: https://issues.apache.org/jira/browse/FLINK-35489
> Project: Flink
> Issue Type: Improvement
> Components: Kubernetes Operator
> Affects Versions: 1.8.0
> Reporter: Nicolas Fraison
> Priority: Major
>
> We have enable the autotuning feature on one of our flink job with below
> config
> {code:java}
> # Autoscaler configuration
> job.autoscaler.enabled: "true"
> job.autoscaler.stabilization.interval: 1m
> job.autoscaler.metrics.window: 10m
> job.autoscaler.target.utilization: "0.8"
> job.autoscaler.target.utilization.boundary: "0.1"
> job.autoscaler.restart.time: 2m
> job.autoscaler.catch-up.duration: 10m
> job.autoscaler.memory.tuning.enabled: true
> job.autoscaler.memory.tuning.overhead: 0.5
> job.autoscaler.memory.tuning.maximize-managed-memory: true{code}
> During a scale down the autotuning decided to give all the memory to to JVM
> (having heap being scale by 2) settting taskmanager.memory.managed.size to 0b.
> Here is the config that was compute by the autotuning for a TM running on a
> 4GB pod:
> {code:java}
> taskmanager.memory.network.max: 4063232b
> taskmanager.memory.network.min: 4063232b
> taskmanager.memory.jvm-overhead.max: 433791712b
> taskmanager.memory.task.heap.size: 3699934605b
> taskmanager.memory.framework.off-heap.size: 134217728b
> taskmanager.memory.jvm-metaspace.size: 22960020b
> taskmanager.memory.framework.heap.size: "0 bytes"
> taskmanager.memory.flink.size: 3838215565b
> taskmanager.memory.managed.size: 0b {code}
> This has lead to some issue starting the TM because we are relying on some
> javaagent performing some memory allocation outside of the JVM (rely on some
> C bindings).
> Tuning the overhead or disabling the scale-down-compensation.enabled could
> have helped for that particular event but this can leads to other issue as it
> could leads to too little HEAP size being computed.
> It would be interesting to be able to set a min memory.managed.size to be
> taken in account by the autotuning.
> What do you think about this? Do you think that some other specific config
> should have been applied to avoid this issue?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)