[ 
https://issues.apache.org/jira/browse/FLINK-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140682#comment-17140682
 ] 

Andrey Zagrebin commented on FLINK-18353:
-----------------------------------------

I agree we should add this to migration guide if we decide to keep the limit. 
The release notes for JM already contain this, we can also add this to TM 
releases notes.

Regarding whether to limit direct memory, we could set the limit only if user 
explicitly configures the direct memory limit. Then user can set the option to 
investigate OOMs. By default, we could account for the off-heap value in total 
memory but not set the JVM limit. This is not aligned with TM but not sure 
whether it is of much value for users (expecting that the limit is set by 
default while investigating OOMs? - maybe)

The safest option is to keep the limit as already mentioned. On the other hand, 
I am ok to remove the limit if we agree that the probability of the leak is 
low. Flink does not use direct memory explicitly in JM, only Akka/Netty. The 
probability of leak is low there. The user code will probably not use the 
direct memory heavily in JM hooks but we cannot control leaks there.

It could be the same for Metaspace but here Flink code is probably the main 
consumer of the Metaspace with no leaks. Then the Metaspace size should be 
almost the same for all use cases. Therefore, we can keep the limit for safety.

> [1.11.0] maybe document jobmanager behavior change regarding 
> -XX:MaxDirectMemorySize
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-18353
>                 URL: https://issues.apache.org/jira/browse/FLINK-18353
>             Project: Flink
>          Issue Type: Improvement
>          Components: Documentation, Runtime / Configuration
>    Affects Versions: 1.11.0
>            Reporter: Steven Zhen Wu
>            Priority: Major
>
> I know FLIP-116 (Unified Memory Configuration for Job Managers) is introduced 
> in 1.11. That does cause a small behavior change regarding 
> `-XX:MaxDirectMemorySize`. Previously, jobmanager don't set JVM arg 
> `-XX:MaxDirectMemorySize`, which means JVM can use up to -`Xmx` size for 
> direct memory. Now `-XX:MaxDirectMemorySize` is always set to 
> [jobmanager.memory.off-heap.size|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#jobmanager-memory-off-heap-size]
>  config (default 128 mb).
>  
> {{It is possible for jobmanager to get "java.lang.OufOfMemoryError: Direct 
> Buffer Memory" without tuning 
> }}{{[jobmanager.memory.off-heap.size|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#jobmanager-memory-off-heap-size]}}
>  especially for high-parallelism jobs. Previously, no tuning needed.
>  
> Maybe we should point out the behavior change in the migration guide?
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to