[jira] [Comment Edited] (FLINK-18353) [1.11.0] maybe document jobmanager behavior change regarding -XX:MaxDirectMemorySize

Andrey Zagrebin (Jira) Fri, 19 Jun 2020 10:13:15 -0700


    [ 
https://issues.apache.org/jira/browse/FLINK-18353?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17140682#comment-17140682
 ]


Andrey Zagrebin edited comment on FLINK-18353 at 6/19/20, 5:12 PM:
-------------------------------------------------------------------

I agree we should add this to migration guide if we decide to keep the limit. 
The release notes for JM already contain this, we can also add this to TM 
releases notes.

Regarding whether to limit direct memory, we could set the limit only if user 
explicitly configures the off-heap memory. Then user can set the option to 
investigate OOMs. By default, we could account for the off-heap value in total 
memory but not set the JVM limit. This is not aligned with TM though, not sure 
how big problem it is: expecting that the limit is set by default while 
investigating OOMs? - maybe. Maybe we can just align it but the risk for user 
code is bigger in TM imo.

The safest option is to keep the limit as already mentioned. On the other hand, 
I am ok to remove the limit if we agree that the probability of the leak is 
low. Flink does not use direct memory explicitly in JM, only Akka/Netty. The 
probability of leak is low there. The user code will probably not use the 
direct memory heavily in JM hooks but we cannot control leaks there.

It could be the same for Metaspace but here Flink code is probably the main 
consumer of the Metaspace with no leaks. Then the Metaspace size should be 
almost the same for all use cases. Therefore, we can keep the limit for safety.


was (Author: azagrebin):
I agree we should add this to migration guide if we decide to keep the limit. 
The release notes for JM already contain this, we can also add this to TM 
releases notes.

Regarding whether to limit direct memory, we could set the limit only if user 
explicitly configures the off-heap memory. Then user can set the option to 
investigate OOMs. By default, we could account for the off-heap value in total 
memory but not set the JVM limit. This is not aligned with TM though, not sure 
how big problem it is: expecting that the limit is set by default while 
investigating OOMs? - maybe. Maybe we can just align it.

The safest option is to keep the limit as already mentioned. On the other hand, 
I am ok to remove the limit if we agree that the probability of the leak is 
low. Flink does not use direct memory explicitly in JM, only Akka/Netty. The 
probability of leak is low there. The user code will probably not use the 
direct memory heavily in JM hooks but we cannot control leaks there.

It could be the same for Metaspace but here Flink code is probably the main 
consumer of the Metaspace with no leaks. Then the Metaspace size should be 
almost the same for all use cases. Therefore, we can keep the limit for safety.

> [1.11.0] maybe document jobmanager behavior change regarding 
> -XX:MaxDirectMemorySize
> ------------------------------------------------------------------------------------
>
>                 Key: FLINK-18353
>                 URL: https://issues.apache.org/jira/browse/FLINK-18353
>             Project: Flink
>          Issue Type: Improvement
>          Components: Documentation, Runtime / Configuration
>    Affects Versions: 1.11.0
>            Reporter: Steven Zhen Wu
>            Priority: Major
>
> I know FLIP-116 (Unified Memory Configuration for Job Managers) is introduced 
> in 1.11. That does cause a small behavior change regarding 
> `-XX:MaxDirectMemorySize`. Previously, jobmanager don't set JVM arg 
> `-XX:MaxDirectMemorySize`, which means JVM can use up to -`Xmx` size for 
> direct memory. Now `-XX:MaxDirectMemorySize` is always set to 
> [jobmanager.memory.off-heap.size|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#jobmanager-memory-off-heap-size]
>  config (default 128 mb).
>  
> {{It is possible for jobmanager to get "java.lang.OufOfMemoryError: Direct 
> Buffer Memory" without tuning 
> }}{{[jobmanager.memory.off-heap.size|https://ci.apache.org/projects/flink/flink-docs-master/ops/config.html#jobmanager-memory-off-heap-size]}}
>  especially for high-parallelism jobs. Previously, no tuning needed.
>  
> Maybe we should point out the behavior change in the migration guide?
> [https://ci.apache.org/projects/flink/flink-docs-master/ops/memory/mem_migration.html#migrate-job-manager-memory-configuration]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Comment Edited] (FLINK-18353) [1.11.0] maybe document jobmanager behavior change regarding -XX:MaxDirectMemorySize

Reply via email to