[
https://issues.apache.org/jira/browse/FLINK-14952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16988046#comment-16988046
]
Piotr Nowojski commented on FLINK-14952:
----------------------------------------
Initially the {{BlockingBoundedPartition}} was supposed to be working purely
using mmap:
1. Results are produced and written to a mmap'ed file, from which they are
being read
This had some blocking issues and was replaced by:
2. Results are written to a file directly, but read using mmap (improves
performance when reading the same partition multiple times). This is what we
are calling "mmap"/"MMAP" mode in this ticket.
3. Results are written to a file and read from a file, without using mmap at
all. "file" or "FILE-FILE" mode
> Yarn containers can exceed physical memory limits when using
> BoundedBlockingSubpartition.
> -----------------------------------------------------------------------------------------
>
> Key: FLINK-14952
> URL: https://issues.apache.org/jira/browse/FLINK-14952
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN, Runtime / Network
> Affects Versions: 1.9.1
> Reporter: Piotr Nowojski
> Priority: Blocker
> Fix For: 1.10.0
>
>
> As [reported by a user on the user mailing
> list|http://apache-flink-user-mailing-list-archive.2336050.n4.nabble.com/CoGroup-SortMerger-performance-degradation-from-1-6-4-1-9-1-td31082.html],
> combination of using {{BoundedBlockingSubpartition}} with yarn containers
> can cause yarn container to exceed memory limits.
> {quote}2019-11-19 12:49:23,068 INFO org.apache.flink.yarn.YarnResourceManager
> - Closing TaskExecutor connection container_e42_1574076744505_9444_01_000004
> because: Container
> [pid=42774,containerID=container_e42_1574076744505_9444_01_000004] is running
> beyond physical memory limits. Current usage: 12.0 GB of 12 GB physical
> memory used; 13.9 GB of 25.2 GB virtual memory used. Killing container.
> {quote}
> This is probably happening because memory usage of mmap is not capped and not
> accounted by configured memory limits, however yarn is tracking this memory
> usage and once Flink exceeds some threshold, container is being killed.
> Workaround is to overrule default value and force Flink to not user mmap, by
> setting a secret (🤫) config option:
> {noformat}
> taskmanager.network.bounded-blocking-subpartition-type: file
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)