Arsenii Venherak created SPARK-31149:
----------------------------------------

             Summary: PySpark job not killing Spark Daemon processes after the 
executor is killed due to OOM
                 Key: SPARK-31149
                 URL: https://issues.apache.org/jira/browse/SPARK-31149
             Project: Spark
          Issue Type: Bug
          Components: PySpark
    Affects Versions: 2.4.5
            Reporter: Arsenii Venherak
             Fix For: 2.4.5


{color:#172b4d}{{{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
 
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}00{color},{color:#0052cc}257{color}
 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
 {color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color} 
{color:#0052cc}327523{color} {color:#0052cc}for{color} container-id 
container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}: 
{color:#0052cc}1.9{color} GB of {color:#0052cc}2{color} GB physical memory 
used; {color:#0052cc}39.5{color} GB of {color:#0052cc}4.2{color} GB virtual 
memory 
used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
 
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}135{color}
 INFO 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
 {color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color} 
{color:#0052cc}327523{color} {color:#0052cc}for{color} container-id 
container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}: 
{color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory 
used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual 
memory 
used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
 
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color}
 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
 {color:#6554c0}Process{color} tree {color:#0052cc}for{color} container: 
container_e25_1583485217113_0347_01_000042
 has processes older than {color:#0052cc}1{color} iteration running over the 
configured limit. {color:#6554c0}Limit{color}={color:#0052cc}2147483648{color}, 
current usage = 
{color:#0052cc}3915513856{color}{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
 
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color}
 WARN 
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
 {color:#6554c0}Container{color} 
[pid={color:#0052cc}327523{color},containerID=container_e25_1583485217113_0347_01_{color:#0052cc}000042{color}]
 is running beyond physical memory limits. {color:#6554c0}Current{color} usage: 
{color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory 
used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual 
memory used. {color:#6554c0}Killing{color} container.{color:#6554c0}Dump{color} 
of the process-tree {color:#0052cc}for{color} 
container_e25_1583485217113_0347_01_000042 :|- {color:#0052cc}327535{color} 
{color:#0052cc}327523{color} {color:#0052cc}327523{color} 
{color:#0052cc}327523{color} (java) {color:#0052cc}1611{color} 
{color:#0052cc}111{color} {color:#0052cc}4044427264{color} 
{color:#0052cc}172306{color} 
/usr/lib/jvm/java-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}-openjdk-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}{color:#0052cc}.242{color}.b08-{color:#0052cc}0.{color}el7_7.x86_64/jre/bin/java
 -server -{color:#6554c0}Xmx1024m{color} 
-{color:#6554c0}Djava{color}.io.tmpdir=/data/s
cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
 -{color:#6554c0}Dspark{color}.ssl.trustStore=/opt/mapr/conf/ssl_truststore 
-{color:#6554c0}Dspark{color}.authenticat
e.enableSaslEncryption=true 
-{color:#6554c0}Dspark{color}.driver.port={color:#0052cc}40653{color} 
-{color:#6554c0}Dspark{color}.network.timeout={color:#0052cc}7200{color} 
-{color:#6554c0}Dspark{color}.ssl.keyStore=/opt/mapr/conf/ssl_keystore 
-{color:#6554c0}Dspark{color}.network.sasl.serverAlwaysEncrypt=true 
-{color:#6554c0}Dspark{color}.ssl.enabled=true 
-{color:#6554c0}Dspark{color}.ssl.protocol={color:#6554c0}TLSv1{color}{color:#0052cc}.2{color}
 -{color:#6554c0}Dspark{color}.ssl.fs.enabled=true 
-{color:#6554c0}Dspark{color}.ssl.ui.enabled=false 
-{color:#6554c0}Dspark{color}.authenticate=true 
-{color:#6554c0}Dspark{color}.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-{color:#0052cc}2.7{color}.{color:#0052cc}0{color}/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
 -XX:{color:#6554c0}OnOutOfMemoryError{color}=kill %p 
org.apache.spark.executor.{color:#6554c0}CoarseGrainedExecutorBackend{color} 
--driver-url
spark://{color:#6554c0}CoarseGrainedScheduler{color}@bd02slse0201.wellsfargo.com:{color:#0052cc}40653{color}
 --executor-id {color:#0052cc}40{color} --hostname bd02slsc0519.wellsfargo.com 
--cores {color:#0052cc}1{color} --app-id application_1583485217113_0347 
--user-{color:#0052cc}class{color}-path
file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar}}{color}
 



After that, there are lots of pyspark.daemon process left.
eg:
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to