[
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16884121#comment-16884121
]
Stavros Kontopoulos commented on SPARK-27927:
---------------------------------------------
I think the issue is here:
```
"dag-scheduler-event-loop" #50 daemon prio=5 os_prio=0 tid=0x00007f561ceb1000
nid=0xa6 waiting on condition [0x00007f5619ee4000] java.lang.Thread.State:
WAITING (parking) at sun.misc.Unsafe.park(Native Method) - parking to wait for
<0x0000000542de6188> (a
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject) at
java.util.concurrent.locks.LockSupport.park(LockSupport.java:175) at
java.util.concurrent.locks.AbstractQueuedSynchronizer$ConditionObject.await(AbstractQueuedSynchronizer.java:2039)
at
java.util.concurrent.LinkedBlockingDeque.takeFirst(LinkedBlockingDeque.java:492)
at java.util.concurrent.LinkedBlockingDeque.take(LinkedBlockingDeque.java:680)
at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:47)
```
Code is
[here|[https://github.com/apache/spark/blob/aa41dcea4a41899507dfe4ec1eceaabb5edf728f/core/src/main/scala/org/apache/spark/util/EventLoop.scala#L47]].
That thread blocked there (blocking queue) and although its a daemon thread it
cannot move forward. Why it happens I dont know exactly but looks similar to
[https://github.com/apache/spark/pull/24796], [~zsxwing] thoughts?
> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> -----------------------------------------------------------
>
> Key: SPARK-27927
> URL: https://issues.apache.org/jira/browse/SPARK-27927
> Project: Spark
> Issue Type: Bug
> Components: Kubernetes, PySpark
> Affects Versions: 3.0.0, 2.4.3
> Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
> Reporter: Edwin Biemond
> Priority: Major
> Attachments: driver_threads.log, executor_threads.log
>
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs
> and never calls the shutdown hook.
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We
> see the output of this python script.
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information: <SparkContext
> master=k8s://https://kubernetes.default.svc:443 appName=hello_world>
> parallelism=2 python version=3.6{noformat}
> What works
> * a simple python with a print works fine on 2.4.3 and 3.0.0
> * same setup on 2.4.0
> * 2.4.3 spark-submit with the above pyspark
>
>
>
--
This message was sent by Atlassian JIRA
(v7.6.14#76016)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]