Arsenii Venherak created SPARK-31149:
----------------------------------------
Summary: PySpark job not killing Spark Daemon processes after the
executor is killed due to OOM
Key: SPARK-31149
URL: https://issues.apache.org/jira/browse/SPARK-31149
Project: Spark
Issue Type: Bug
Components: PySpark
Affects Versions: 2.4.5
Reporter: Arsenii Venherak
Fix For: 2.4.5
{color:#172b4d}{{{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}00{color},{color:#0052cc}257{color}
INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
{color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color}
{color:#0052cc}327523{color} {color:#0052cc}for{color} container-id
container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}:
{color:#0052cc}1.9{color} GB of {color:#0052cc}2{color} GB physical memory
used; {color:#0052cc}39.5{color} GB of {color:#0052cc}4.2{color} GB virtual
memory
used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}135{color}
INFO
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
{color:#6554c0}Memory{color} usage of {color:#6554c0}ProcessTree{color}
{color:#0052cc}327523{color} {color:#0052cc}for{color} container-id
container_e25_1583{color:#0052cc}485217113_0347_01_000042{color}:
{color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory
used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual
memory
used{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color}
WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
{color:#6554c0}Process{color} tree {color:#0052cc}for{color} container:
container_e25_1583485217113_0347_01_000042
has processes older than {color:#0052cc}1{color} iteration running over the
configured limit. {color:#6554c0}Limit{color}={color:#0052cc}2147483648{color},
current usage =
{color:#0052cc}3915513856{color}{color:#0052cc}2020{color}-{color:#0052cc}03{color}-{color:#0052cc}10{color}
{color:#0052cc}10{color}:{color:#0052cc}15{color}:{color:#0052cc}05{color},{color:#0052cc}136{color}
WARN
org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.{color:#6554c0}ContainersMonitorImpl{color}:
{color:#6554c0}Container{color}
[pid={color:#0052cc}327523{color},containerID=container_e25_1583485217113_0347_01_{color:#0052cc}000042{color}]
is running beyond physical memory limits. {color:#6554c0}Current{color} usage:
{color:#0052cc}3.6{color} GB of {color:#0052cc}2{color} GB physical memory
used; {color:#0052cc}41.1{color} GB of {color:#0052cc}4.2{color} GB virtual
memory used. {color:#6554c0}Killing{color} container.{color:#6554c0}Dump{color}
of the process-tree {color:#0052cc}for{color}
container_e25_1583485217113_0347_01_000042 :|- {color:#0052cc}327535{color}
{color:#0052cc}327523{color} {color:#0052cc}327523{color}
{color:#0052cc}327523{color} (java) {color:#0052cc}1611{color}
{color:#0052cc}111{color} {color:#0052cc}4044427264{color}
{color:#0052cc}172306{color}
/usr/lib/jvm/java-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}-openjdk-{color:#0052cc}1.8{color}{color:#0052cc}.0{color}{color:#0052cc}.242{color}.b08-{color:#0052cc}0.{color}el7_7.x86_64/jre/bin/java
-server -{color:#6554c0}Xmx1024m{color}
-{color:#6554c0}Djava{color}.io.tmpdir=/data/s
cratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/tmp
-{color:#6554c0}Dspark{color}.ssl.trustStore=/opt/mapr/conf/ssl_truststore
-{color:#6554c0}Dspark{color}.authenticat
e.enableSaslEncryption=true
-{color:#6554c0}Dspark{color}.driver.port={color:#0052cc}40653{color}
-{color:#6554c0}Dspark{color}.network.timeout={color:#0052cc}7200{color}
-{color:#6554c0}Dspark{color}.ssl.keyStore=/opt/mapr/conf/ssl_keystore
-{color:#6554c0}Dspark{color}.network.sasl.serverAlwaysEncrypt=true
-{color:#6554c0}Dspark{color}.ssl.enabled=true
-{color:#6554c0}Dspark{color}.ssl.protocol={color:#6554c0}TLSv1{color}{color:#0052cc}.2{color}
-{color:#6554c0}Dspark{color}.ssl.fs.enabled=true
-{color:#6554c0}Dspark{color}.ssl.ui.enabled=false
-{color:#6554c0}Dspark{color}.authenticate=true
-{color:#6554c0}Dspark{color}.yarn.app.container.log.dir=/opt/mapr/hadoop/hadoop-{color:#0052cc}2.7{color}.{color:#0052cc}0{color}/logs/userlogs/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042
-XX:{color:#6554c0}OnOutOfMemoryError{color}=kill %p
org.apache.spark.executor.{color:#6554c0}CoarseGrainedExecutorBackend{color}
--driver-url
spark://{color:#6554c0}CoarseGrainedScheduler{color}@bd02slse0201.wellsfargo.com:{color:#0052cc}40653{color}
--executor-id {color:#0052cc}40{color} --hostname bd02slsc0519.wellsfargo.com
--cores {color:#0052cc}1{color} --app-id application_1583485217113_0347
--user-{color:#0052cc}class{color}-path
file:/data/scratch/yarn/usercache/u689299/appcache/application_1583485217113_0347/container_e25_1583485217113_0347_01_000042/__app__.jar}}{color}
After that, there are lots of pyspark.daemon process left.
eg:
/apps/anaconda3-5.3.0/bin/python -m pyspark.daemon
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]