[ 
https://issues.apache.org/jira/browse/FLINK-13477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Till Rohrmann closed FLINK-13477.
---------------------------------
    Resolution: Duplicate

This ticket has been solved as part of FLIP-49.

> Containerized TaskManager killed because of lack of memory overhead
> -------------------------------------------------------------------
>
>                 Key: FLINK-13477
>                 URL: https://issues.apache.org/jira/browse/FLINK-13477
>             Project: Flink
>          Issue Type: Improvement
>          Components: Deployment / Mesos, Deployment / YARN
>    Affects Versions: 1.9.0
>            Reporter: Benoit Hanotte
>            Priority: Minor
>              Labels: pull-request-available
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, the `-XX:MaxDirectMemorySize` parameter is set as:
> `MaxDirectMemorySize = containerMemoryMB - heapSizeMB`
> (see 
> [https://github.com/apache/flink/blob/7fec4392b21b07c69ba15ea554731886f181609e/flink-runtime/src/main/java/org/apache/flink/runtime/clusterframework/ContaineredTaskManagerParameters.java#L162])
> However as explained at
>  https://docs.oracle.com/javase/8/docs/technotes/tools/unix/java.html,
> `MaxDirectMemorySize` only sets the maximum amount of memory that can be
> used for direct buffers, thus the amount of off-heap memory used can be
> greater than that value, leading to the container being killed by Mesos
> or Yarn as it exceeds the allocated memory.
> In addition, users might want to allocate off-heap memory through native
> code, in which case they will want to keep some of the container memory
> free and unallocated by Flink.
> To solve this issue, we currently set the following parameter:
> {code:java}
> -Dcontainerized.taskmanager.env.FLINK_ENV_JAVA_OPTS='-XX:MaxDirectMemorySize=600m'
> {code}
> which overrides the value that Flink picks (744M in this case) with a lower 
> one to keep some overhead memory in the TaskManager containers. However this 
> is an "ugly" hack as it goes around the clever memory allocation that Flink 
> performs and allows to bypass the sanity checks done in 
> `ContaineredTaskManagerParameters`.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to