[ https://issues.apache.org/jira/browse/FLINK-10884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
chunpinghe updated FLINK-10884: ------------------------------- Comment: was deleted (was: what's your solution? yarn will check the physical memory used by container by default, you can disable it by set {color:#6a8759}yarn.nodemanager.pmem-check-enabled {color:#333333}to false. in your example, if your container use too much offheap memory(directory memory , or jni malloc) lead to total memory exceeds 3g then the container will be killed anyhow.{color} {color} {color:#6a8759}{color:#333333}so, if your container was always killed by nodemanager you shoud check if the total memory you provided for it is not sufficient or your code has memory leak (mainly native memory leak){color}{color} ) > Flink on yarn TM container will be killed by nodemanager because of the > exceeded physical memory. > ---------------------------------------------------------------------------------------------------- > > Key: FLINK-10884 > URL: https://issues.apache.org/jira/browse/FLINK-10884 > Project: Flink > Issue Type: Bug > Components: Deployment / YARN, Runtime / Coordination > Affects Versions: 1.5.5, 1.6.2, 1.7.0 > Environment: version : 1.6.2 > module : flink on yarn > centos jdk1.8 > hadoop 2.7 > Reporter: wgcn > Assignee: wgcn > Priority: Major > Labels: pull-request-available, yarn > > TM container will be killed by nodemanager because of the exceeded > [physical|http://www.baidu.com/link?url=Y4LyfMDH59n9-Ey16Fo6EFAYltN1e9anB3y2ynhVmdvuIBCkJGdH0hTExKDZRvXNr6hqhwIXs8JjYqesYbx0BOpQDD0o1VjbVQlOC-9MgXi] > memory. I found the lanuch context lanuching TM container that > "container memory = heap memory+ offHeapSizeMB" at the class > org.apache.flink.runtime.clusterframework.ContaineredTaskManagerParameters > from line 160 to 166 I set a safety margin for the whole memory container > using. For example if the container limit 3g memory, the sum memory that > "heap memory+ offHeapSizeMB" is equal to 2.4g to prevent the container > being killed.Do we have the > [ready-made|http://www.baidu.com/link?url=ylC8cEafGU6DWAdU9ADcJPNugkjbx6IjtqIIxJ9foX4_Yfgc7ctWmpEpQRettVmBiOy7Wfph7S1UvN5LiJj-G1Rsb--oDw4Z2OEbA5Fj0bC] > solution or I can commit my solution -- This message was sent by Atlassian JIRA (v7.6.3#76005)