[ 
https://issues.apache.org/jira/browse/SPARK-31149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Arsenii Venherak updated SPARK-31149:
-------------------------------------
    Description: 
{code:java}
2020-03-10 10:15:00,257 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 327523 for container-id container_e25_1583
485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2 
GB virtual memory used
2020-03-10 10:15:05,135 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Memory usage of ProcessTree 327523 for container-id container_e25_1583
485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 
GB virtual memory used
2020-03-10 10:15:05,136 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Process tree for container: container_e25_1583485217113_0347_01_000042
 has processes older than 1 iteration running over the configured limit. 
Limit=2147483648, current usage = 3915513856
2020-03-10 10:15:05,136 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
 Container [pid=327523,containerID=container_e25_1583485217113_0347_01_
000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2 GB 
physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing container.
Dump of the process-tree for container_e25_1583485217113_0347_01_000042 :
        |- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306 
/usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java 
-server -Xmx1024m -Djava.io.tmpdir=/data/s
cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
 -Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat
e.enableSaslEncryption=true -Dspark.driver.port=40653 
-Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore 
-Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl
.enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true 
-Dspark.ssl.ui.enabled=false -Dspark.authenticate=true 
-Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.
0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
 -XX:OnOutOfMemoryError=kill %p 
org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
spark://[email protected]:40653 --executor-id 
40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id 
application_1583485217113_0347 --user-class-path
file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar
{code}
 

 

After that, there are lots of pyspark.daemon process left.
 eg:
 /apps/anaconda3-5.3.0/bin/python -m pyspark.daemon

  was:
{color:#172b4d}{{{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
 
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}00{color},{color:#0052cc}257{color}
 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
 {color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color} 
{color:#0052cc}327523{color} {color:#0052cc}for{color} container-id 
container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}: 
{color:#0052cc}1.9{color} GB of {color:#0052cc}2{color} GB physical memory 
used; {color:#0052cc}39.5{color} GB of {color:#0052cc}4.2{color} GB virtual 
memory 
used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
 
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}135{color}
 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
 {color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color} 
{color:#0052cc}327523{color} {color:#0052cc}for{color} container-id 
container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}: 
{color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory 
used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual 
memory 
used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
 
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color}
 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
 {color:#6554c0}Process{color} tree {color:#0052cc}for{color} container: 
container_e25_1583485217113_0347_01_000042
 has processes older than {color:#0052cc}1{color} iteration running over the 
configured limit. {color:#6554c0}Limit{color}={color:#0052cc}2147483648{color}, 
current usage = 
{color:#0052cc}3915513856{color}{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
 
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color}
 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
 {color:#6554c0}Container{color} 
[pid={color:#0052cc}327523{color},containerID=container_e25_1583485217113_0347_01_{color:#0052cc}000042{color}]
 is running beyond physical memory limits. {color:#6554c0}Current{color} usage: 
{color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory 
used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual 
memory used. {color:#6554c0}Killing{color} container.{color:#6554c0}Dump{color} 
of the process-tree {color:#0052cc}for{color} 
container_e25_1583485217113_0347_01_000042 :|- {color:#0052cc}327535{color} 
{color:#0052cc}327523{color} {color:#0052cc}327523{color} 
{color:#0052cc}327523{color} (java) {color:#0052cc}1611{color} 
{color:#0052cc}111{color} {color:#0052cc}4044427264{color} 
{color:#0052cc}172306{color} 
/usr/lib/jvm/java-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}-openjdk-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}{color:#0052cc}.242{color}.b08-{color:#0052cc}0.{color}el7_7.x86_64/jre/bin/java
 -server -{color:#6554c0}Xmx1024m{color} 
-{color:#6554c0}Djava{color}.io.tmpdir=/data/s
cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
 -{color:#6554c0}Dspark{color}.ssl.trustStore=/opt/mapr/conf/ssl_truststore 
-{color:#6554c0}Dspark{color}.authenticat
e.enableSaslEncryption=true 
-{color:#6554c0}Dspark{color}.driver.port={color:#0052cc}40653{color} 
-{color:#6554c0}Dspark{color}.network.timeout={color:#0052cc}7200{color} 
-{color:#6554c0}Dspark{color}.ssl.keyStore=/opt/mapr/conf/ssl_keystore 
-{color:#6554c0}Dspark{color}.network.sasl.serverAlwaysEncrypt=true 
-{color:#6554c0}Dspark{color}.ssl.enabled=true 
-{color:#6554c0}Dspark{color}.ssl.protocol={color:#6554c0}TLSv1{color}{color:#0052cc}.2{color}
 -{color:#6554c0}Dspark{color}.ssl.fs.enabled=true 
-{color:#6554c0}Dspark{color}.ssl.ui.enabled=false 
-{color:#6554c0}Dspark{color}.authenticate=true 
-{color:#6554c0}Dspark{color}.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-{color:#0052cc}2.7{color}.{color:#0052cc}0{color}/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
 -XX:{color:#6554c0}OnOutOfMemoryError{color}=kill %p 
org.apache.spark.executor.{color:#6554c0}CoarseGrainedExecutorBackend{color} 
--driver-url
spark://{color:#6554c0}CoarseGrainedScheduler{color}@bd02slse0201.wellsfargo.com:{color:#0052cc}40653{color}
 --executor-id {color:#0052cc}40{color} --hostname bd02slsc0519.wellsfargo.com 
--cores {color:#0052cc}1{color} --app-id application_1583485217113_0347 
--user-{color:#0052cc}class{color}-path
file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar}}{color}
 



After that, there are lots of pyspark.daemon process left.
eg:
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon


> PySpark job not killing Spark Daemon processes after the executor is killed 
> due to OOM
> --------------------------------------------------------------------------------------
>
>                 Key: SPARK-31149
>                 URL: https://issues.apache.org/jira/browse/SPARK-31149
>             Project: Spark
>          Issue Type: Bug
>          Components: PySpark
>    Affects Versions: 2.4.5
>            Reporter: Arsenii Venherak
>            Priority: Major
>             Fix For: 2.4.5
>
>
> {code:java}
> 2020-03-10 10:15:00,257 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 1.9 GB of 2 GB physical memory used; 39.5 GB of 4.2 
> GB virtual memory used
> 2020-03-10 10:15:05,135 INFO 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Memory usage of ProcessTree 327523 for container-id container_e25_1583
> 485217113_0347_01_000042: 3.6 GB of 2 GB physical memory used; 41.1 GB of 4.2 
> GB virtual memory used
> 2020-03-10 10:15:05,136 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Process tree for container: container_e25_1583485217113_0347_01_000042
>  has processes older than 1 iteration running over the configured limit. 
> Limit=2147483648, current usage = 3915513856
> 2020-03-10 10:15:05,136 WARN 
> org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl:
>  Container [pid=327523,containerID=container_e25_1583485217113_0347_01_
> 000042] is running beyond physical memory limits. Current usage: 3.6 GB of 2 
> GB physical memory used; 41.1 GB of 4.2 GB virtual memory used. Killing 
> container.
> Dump of the process-tree for container_e25_1583485217113_0347_01_000042 :
>         |- 327535 327523 327523 327523 (java) 1611 111 4044427264 172306 
> /usr/lib/jvm/java-1.8.0-openjdk-1.8.0.242.b08-0.el7_7.x86_64/jre/bin/java 
> -server -Xmx1024m -Djava.io.tmpdir=/data/s
> cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
>  -Dspark.ssl.trustStore=/opt/mapr/conf/ssl_truststore -Dspark.authenticat
> e.enableSaslEncryption=true -Dspark.driver.port=40653 
> -Dspark.network.timeout=7200 -Dspark.ssl.keyStore=/opt/mapr/conf/ssl_keystore 
> -Dspark.network.sasl.serverAlwaysEncrypt=true -Dspark.ssl
> .enabled=true -Dspark.ssl.protocol=TLSv1.2 -Dspark.ssl.fs.enabled=true 
> -Dspark.ssl.ui.enabled=false -Dspark.authenticate=true 
> -Dspark.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-2.7.
> 0/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
>  -XX:OnOutOfMemoryError=kill %p 
> org.apache.spark.executor.CoarseGrainedExecutorBackend --driver-url
> spark://[email protected]:40653 
> --executor-id 40 --hostname bd02slsc0519.wellsfargo.com --cores 1 --app-id 
> application_1583485217113_0347 --user-class-path
> file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar
> {code}
>  
>  
> After that, there are lots of pyspark.daemon process left.
>  eg:
>  /apps/anaconda3-5.3.0/bin/python -m pyspark.daemon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to