[
https://issues.apache.org/jira/browse/FLINK-15906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17229817#comment-17229817
]
yang gang commented on FLINK-15906:
-----------------------------------
{code:java}
Closing TaskExecutor connection container_1597847003686_0079_01_000121.
Because: Container
[pid=4269,containerID=container_1597847003686_0079_01_000121] is running beyond
physical memory limits. Current usage: 20.0 GB of 20 GB physical memory used;
24.9 GB of 100 GB virtual memory used. Killing container.
Dump of the process-tree for container_1597847003686_0079_01_000121 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS)
VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 4298 4269 4269 4269 (java) 104835705 33430931 26634625024 5242644
/usr/local/jdk1.8/bin/java -Xmx10871635848 -Xms10871635848
-XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -server
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=75 -XX:ParallelGCThreads=4
-XX:+AlwaysPreTouch -XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=3 -DjobName=ck_local_growthline_new-10
-Dlog.file=/data2/yarn/containers/application_1597847003686_0079/container_1597847003686_0079_01_000121/taskmanager.log
-Dlog4j.configuration=file:./log4j.properties
org.apache.flink.yarn.YarnTaskExecutorRunner -D
taskmanager.memory.framework.off-heap.size=134217728b -D
taskmanager.memory.network.max=1073741824b -D
taskmanager.memory.network.min=1073741824b -D
taskmanager.memory.framework.heap.size=134217728b -D
taskmanager.memory.managed.size=8053063800b -D taskmanager.cpu.cores=10.0 -D
taskmanager.memory.task.heap.size=10737418120b -D
taskmanager.memory.task.off-heap.size=0b --configDir .
-Djobmanager.rpc.address={address} -Dweb.port=0
-Dweb.tmpdir=/tmp/flink-web-0874be2a-720d-443c-a069-0bb1fad69433
-Djobmanager.rpc.port=36047 -Drest.address={address}
|- 4269 4267 4269 4269 (bash) 0 0 115904512 359 /bin/bash -c
/usr/local/jdk1.8/bin/java -Xmx10871635848 -Xms10871635848
-XX:MaxDirectMemorySize=1207959552 -XX:MaxMetaspaceSize=268435456 -server
-XX:+UseConcMarkSweepGC -XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=75 -XX:ParallelGCThreads=4
-XX:+AlwaysPreTouch -XX:+UseCMSCompactAtFullCollection
-XX:CMSFullGCsBeforeCompaction=3 -DjobName=ck_local_growthline_new-10
-Dlog.file=/data2/yarn/containers/application_1597847003686_0079/container_1597847003686_0079_01_000121/taskmanager.log
-Dlog4j.configuration=file:./log4j.properties
org.apache.flink.yarn.YarnTaskExecutorRunner -D
taskmanager.memory.framework.off-heap.size=134217728b -D
taskmanager.memory.network.max=1073741824b -D
taskmanager.memory.network.min=1073741824b -D
taskmanager.memory.framework.heap.size=134217728b -D
taskmanager.memory.managed.size=8053063800b -D taskmanager.cpu.cores=10.0 -D
taskmanager.memory.task.heap.size=10737418120b -D
taskmanager.memory.task.off-heap.size=0b --configDir .
-Djobmanager.rpc.address={address} -Dweb.port='0'
-Dweb.tmpdir='/tmp/flink-web-0874be2a-720d-443c-a069-0bb1fad69433'
-Djobmanager.rpc.port='36047' -Drest.address={address} 1>
/data2/yarn/containers/application_1597847003686_0079/container_1597847003686_0079_01_000121/taskmanager.out
2>
/data2/yarn/containers/application_1597847003686_0079/container_1597847003686_0079_01_000121/taskmanager.err
{code}
[~xintongsong] I have also encountered this kind of problem. This is a task of
calculating DAU indicators. But this exception does not happen frequently. I
have observed the memory metrics and logs of this task, but have not found
useful information, so I would like to ask you how to solve this problem?
> physical memory exceeded causing being killed by yarn
> -----------------------------------------------------
>
> Key: FLINK-15906
> URL: https://issues.apache.org/jira/browse/FLINK-15906
> Project: Flink
> Issue Type: Bug
> Components: Deployment / YARN
> Reporter: liupengcheng
> Priority: Major
>
> Recently, we encoutered this issue when testing TPCDS query with 100g data.
> I first meet this issue when I only set the
> `taskmanager.memory.total-process.size` to `4g` with `-tm` option. Then I try
> to increase the jvmOverhead size with following arguments, but still failed.
> {code:java}
> taskmanager.memory.jvm-overhead.min: 640m
> taskmanager.memory.jvm-metaspace: 128m
> taskmanager.memory.task.heap.size: 1408m
> taskmanager.memory.framework.heap.size: 128m
> taskmanager.memory.framework.off-heap.size: 128m
> taskmanager.memory.managed.size: 1408m
> taskmanager.memory.shuffle.max: 256m
> {code}
> {code:java}
> java.lang.Exception: [2020-02-05 11:31:32.345]Container
> [pid=101677,containerID=container_e08_1578903621081_4785_01_000051] is
> running 46342144B beyond the 'PHYSICAL' memory limit. Current usage: 4.04 GB
> of 4 GB physical memory used; 17.68 GB of 40 GB virtual memory used. Killing
> container.java.lang.Exception: [2020-02-05 11:31:32.345]Container
> [pid=101677,containerID=container_e08_1578903621081_4785_01_000051] is
> running 46342144B beyond the 'PHYSICAL' memory limit. Current usage: 4.04 GB
> of 4 GB physical memory used; 17.68 GB of 40 GB virtual memory used. Killing
> container.Dump of the process-tree for
> container_e08_1578903621081_4785_01_000051 : |- PID PPID PGRPID SESSID
> CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES)
> RSSMEM_USAGE(PAGES) FULL_CMD_LINE |- 101938 101677 101677 101677 (java) 25762
> 3571 18867417088 1059157 /opt/soft/openjdk1.8.0/bin/java
> -Dhadoop.root.logfile=syslog -Xmx1610612736 -Xms1610612736
> -XX:MaxDirectMemorySize=402653184 -XX:MaxMetaspaceSize=134217728
> -Dlog.file=/home/work/hdd5/yarn/zjyprc-analysis/nodemanager/application_1578903621081_4785/container_e08_1578903621081_4785_01_000051/taskmanager.log
> -Dlog4j.configuration=file:./log4j.properties
> org.apache.flink.yarn.YarnTaskExecutorRunner -D
> taskmanager.memory.shuffle.max=268435456b -D
> taskmanager.memory.framework.off-heap.size=134217728b -D
> taskmanager.memory.framework.heap.size=134217728b -D
> taskmanager.memory.managed.size=1476395008b -D taskmanager.cpu.cores=1.0 -D
> taskmanager.memory.task.heap.size=1476395008b -D
> taskmanager.memory.task.off-heap.size=0b -D
> taskmanager.memory.shuffle.min=268435456b --configDir .
> -Djobmanager.rpc.address=zjy-hadoop-prc-st2805.bj -Dweb.port=0
> -Dweb.tmpdir=/tmp/flink-web-4bf6cd3a-a6e1-4b46-b140-b8ac7bdffbeb
> -Djobmanager.rpc.port=36769 -Dtaskmanager.memory.managed.size=1476395008b
> -Drest.address=zjy-hadoop-prc-st2805.bj |- 101677 101671 101677 101677 (bash)
> 1 1 118030336 733 /bin/bash -c /opt/soft/openjdk1.8.0/bin/java
> -Dhadoop.root.logfile=syslog -Xmx1610612736 -Xms1610612736
> -XX:MaxDirectMemorySize=402653184 -XX:MaxMetaspaceSize=134217728
> -Dlog.file=/home/work/hdd5/yarn/zjyprc-analysis/nodemanager/application_1578903621081_4785/container_e08_1578903621081_4785_01_000051/taskmanager.log
> -Dlog4j.configuration=file:./log4j.properties
> org.apache.flink.yarn.YarnTaskExecutorRunner -D
> taskmanager.memory.shuffle.max=268435456b -D
> taskmanager.memory.framework.off-heap.size=134217728b -D
> taskmanager.memory.framework.heap.size=134217728b -D
> taskmanager.memory.managed.size=1476395008b -D taskmanager.cpu.cores=1.0 -D
> taskmanager.memory.task.heap.size=1476395008b -D
> taskmanager.memory.task.off-heap.size=0b -D
> taskmanager.memory.shuffle.min=268435456b --configDir .
> -Djobmanager.rpc.address=zjy-hadoop-prc-st2805.bj -Dweb.port=0
> -Dweb.tmpdir=/tmp/flink-web-4bf6cd3a-a6e1-4b46-b140-b8ac7bdffbeb
> -Djobmanager.rpc.port=36769 -Dtaskmanager.memory.managed.size=1476395008b
> -Drest.address=zjy-hadoop-prc-st2805.bj 1>
> /home/work/hdd5/yarn/zjyprc-analysis/nodemanager/application_1578903621081_4785/container_e08_1578903621081_4785_01_000051/taskmanager.out
> 2>
> /home/work/hdd5/yarn/zjyprc-analysis/nodemanager/application_1578903621081_4785/container_e08_1578903621081_4785_01_000051/taskmanager.err
> {code}
> I suspect there are some leaks or unexpected offheap memory usage.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)