[
https://issues.apache.org/jira/browse/SPARK-31149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Arsenii Venherak updated SPARK-31149:
-------------------------------------
Description:
{code:java}
2020-03-10 10:15:00,257 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 327523 for container-id container_e25_1583
485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2
GB virtual memory used
2020-03-10 10:15:05,135 INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Memory usage of ProcessTree 327523 for container-id container_e25_1583
485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2
GB virtual memory used
2020-03-10 10:15:05,136 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Process tree for container: container_e25_1583485217113_0347_01_000042
has processes older than 1 iteration running over the configured limit.
Limit=2147483648, current usage = 3915513856
2020-03-10 10:15:05,136 WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
Container [pid=327523,containerID=container_e25_1583485217113_0347_01_
000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2 GB
physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e25_1583485217113_0347_01_000042 :
|- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java
-server -Xmx1024m -Djava.io.tmpdir=/data/s
cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
-Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat
e.enableSaslEncryption=true -Dspark.driver.port=40653
-Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore
-Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl
.enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true
-Dspark.ssl.ui.enabled=false -Dspark.authenticate=true
-Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.
0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
-XX:OnOutOfMemoryError=kill %p
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
spark://[email protected]:40653 --executor-id
40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id
application_1583485217113_0347 --user-class-path
file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar
{code}
After that, there are lots of pyspark.daemon process left.
eg:
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
was:
{color:#172b4d}{{{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}00{color},{color:#0052cc}257{color}
INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
{color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color}
{color:#0052cc}327523{color} {color:#0052cc}for{color} container-id
container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}:
{color:#0052cc}1.9{color} GB of {color:#0052cc}2{color} GB physical memory
used; {color:#0052cc}39.5{color} GB of {color:#0052cc}4.2{color} GB virtual
memory
used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}135{color}
INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
{color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color}
{color:#0052cc}327523{color} {color:#0052cc}for{color} container-id
container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}:
{color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory
used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual
memory
used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color}
WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
{color:#6554c0}Process{color} tree {color:#0052cc}for{color} container:
container_e25_1583485217113_0347_01_000042
has processes older than {color:#0052cc}1{color} iteration running over the
configured limit. {color:#6554c0}Limit{color}={color:#0052cc}2147483648{color},
current usage =
{color:#0052cc}3915513856{color}{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color}
WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
{color:#6554c0}Container{color}
[pid={color:#0052cc}327523{color},containerID=container_e25_1583485217113_0347_01_{color:#0052cc}000042{color}]
is running beyond physical memory limits. {color:#6554c0}Current{color} usage:
{color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory
used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual
memory used. {color:#6554c0}Killing{color} container.{color:#6554c0}Dump{color}
of the process-tree {color:#0052cc}for{color}
container_e25_1583485217113_0347_01_000042 :|- {color:#0052cc}327535{color}
{color:#0052cc}327523{color} {color:#0052cc}327523{color}
{color:#0052cc}327523{color} (java) {color:#0052cc}1611{color}
{color:#0052cc}111{color} {color:#0052cc}4044427264{color}
{color:#0052cc}172306{color}
/usr/lib/jvm/java-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}-openjdk-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}{color:#0052cc}.242{color}.b08-{color:#0052cc}0.{color}el7_7.x86_64/jre/bin/java
-server -{color:#6554c0}Xmx1024m{color}
-{color:#6554c0}Djava{color}.io.tmpdir=/data/s
cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
-{color:#6554c0}Dspark{color}.ssl.trustStore=/opt/mapr/conf/ssl_truststore
-{color:#6554c0}Dspark{color}.authenticat
e.enableSaslEncryption=true
-{color:#6554c0}Dspark{color}.driver.port={color:#0052cc}40653{color}
-{color:#6554c0}Dspark{color}.network.timeout={color:#0052cc}7200{color}
-{color:#6554c0}Dspark{color}.ssl.keyStore=/opt/mapr/conf/ssl_keystore
-{color:#6554c0}Dspark{color}.network.sasl.serverAlwaysEncrypt=true
-{color:#6554c0}Dspark{color}.ssl.enabled=true
-{color:#6554c0}Dspark{color}.ssl.protocol={color:#6554c0}TLSv1{color}{color:#0052cc}.2{color}
-{color:#6554c0}Dspark{color}.ssl.fs.enabled=true
-{color:#6554c0}Dspark{color}.ssl.ui.enabled=false
-{color:#6554c0}Dspark{color}.authenticate=true
-{color:#6554c0}Dspark{color}.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-{color:#0052cc}2.7{color}.{color:#0052cc}0{color}/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
-XX:{color:#6554c0}OnOutOfMemoryError{color}=kill %p
org.apache.spark.executor.{color:#6554c0}CoarseGrainedExecutorBackend{color}
--driver-url
spark://{color:#6554c0}CoarseGrainedScheduler{color}@bd02slse0201.wellsfargo.com:{color:#0052cc}40653{color}
--executor-id {color:#0052cc}40{color} --hostname bd02slsc0519.wellsfargo.com
--cores {color:#0052cc}1{color} --app-id application_1583485217113_0347
--user-{color:#0052cc}class{color}-path
file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar}}{color}
After that, there are lots of pyspark.daemon process left.
eg:
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
> PySpark job not killing Spark Daemon processes after the executor is killed
> due to OOM
> --------------------------------------------------------------------------------------
>
> Key: SPARK-31149
> URL: https://issues.apache.org/jira/browse/SPARK-31149
> Project: Spark
> Issue Type: Bug
> Components: PySpark
> Affects Versions: 2.4.5
> Reporter: Arsenii Venherak
> Priority: Major
> Fix For: 2.4.5
>
>
> {code:java}
> 2020-03-10 10:15:00,257 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2
> GB virtual memory used
> 2020-03-10 10:15:05,135 INFO
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2
> GB virtual memory used
> 2020-03-10 10:15:05,136 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Process tree for container: container_e25_1583485217113_0347_01_000042
> has processes older than 1 iteration running over the configured limit.
> Limit=2147483648, current usage = 3915513856
> 2020-03-10 10:15:05,136 WARN
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
> Container [pid=327523,containerID=container_e25_1583485217113_0347_01_
> 000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2
> GB physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing
> container.
> Dump of the process-tree for container_e25_1583485217113_0347_01_000042 :
> |- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java
> -server -Xmx1024m -Djava.io.tmpdir=/data/s
> cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
> -Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat
> e.enableSaslEncryption=true -Dspark.driver.port=40653
> -Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore
> -Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl
> .enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true
> -Dspark.ssl.ui.enabled=false -Dspark.authenticate=true
> -Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.
> 0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
> -XX:OnOutOfMemoryError=kill %p
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> spark://[email protected]:40653
> --executor-id 40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id
> application_1583485217113_0347 --user-class-path
> file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar
> {code}
>
>
> After that, there are lots of pyspark.daemon process left.
> eg:
> /apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]