[ 
https://issues.apache.org/jira/browse/FLINK-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16984103#comment-16984103
 ] 

Yingjie Cao commented on FLINK-14952:
-------------------------------------

> Doesn't this defeat the purpose of using mmap in the first place? For it to 
> be beneficial two different readers of the mmap'ed region would have to 
> closely follow one another, right?

You are right, if we decide to manage the mmapped region, we need to consider 
when and which region to recycle. Implementing a memory management algorithm is 
possible but can be complicated, currently, OS does it for us.

Maybe there's another choice we can consider - using the FILE-FILE mode in the 
first place. The FILE-FILE mode can also leverage the capability of OS page 
cache and the only problem is that it uses some unpooled heap memory for 
reading, which has a potential of OOM. If we can manage those memory in the 
future, there should be no problem with FILE-FILE mode.

> Yarn containers can exceed physical memory limits when using 
> BoundedBlockingSubpartition.
> -----------------------------------------------------------------------------------------
>
>                 Key: FLINK-14952
>                 URL: https://issues.apache.org/jira/browse/FLINK-14952
>             Project: Flink
>          Issue Type: Bug
>          Components: Deployment / YARN, Runtime / Network
>    Affects Versions: 1.9.1
>            Reporter: Piotr Nowojski
>            Priority: Blocker
>             Fix For: 1.10.0
>
>
> As [reported by a user on the user mailing 
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoGroup-SortMerger-performance-degradation-from-1-6-4-1-9-1-td31082.html],
>  combination of using {{BoundedBlockingSubpartition}} with yarn containers 
> can cause yarn container to exceed memory limits.
> {quote}2019-11-19 12:49:23,068 INFO org.apache.flink.yarn.YarnResourceManager 
> - Closing TaskExecutor connection container_e42_1574076744505_9444_01_000004 
> because: Container 
> [pid=42774,containerID=container_e42_1574076744505_9444_01_000004] is running 
> beyond physical memory limits. Current usage: 12.0 GB of 12 GB physical 
> memory used; 13.9 GB of 25.2 GB virtual memory used. Killing container.
> {quote}
> This is probably happening because memory usage of mmap is not capped and not 
> accounted by configured memory limits, however yarn is tracking this memory 
> usage and once Flink exceeds some threshold, container is being killed.
> Workaround is to overrule default value and force Flink to not user mmap, by 
> setting a secret (🤫) config option:
> {noformat}
> taskmanager.network.bounded-blocking-subpartition-type: file
> {noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to