[ 
https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854471#comment-16854471
 ] 

Edwin Biemond commented on SPARK-27927:
---------------------------------------

same but then for 2.4.0 , in this case the shutdown hook is called on the 
driver.
{noformat}

Our Spark version is 2.4.0
Spark context information: <SparkContext 
master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
parallelism=2 python version=3.6


19/06/03 09:53:23 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
19/06/03 09:53:28 INFO SparkContext: Running Spark version 2.4.0
19/06/03 09:53:28 INFO SparkContext: Submitted application: hello_world
19/06/03 09:53:28 INFO SecurityManager: Changing view acls to: root
19/06/03 09:53:28 INFO SecurityManager: Changing modify acls to: root
19/06/03 09:53:28 INFO SecurityManager: Changing view acls groups to:
19/06/03 09:53:28 INFO SecurityManager: Changing modify acls groups to:
19/06/03 09:53:28 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); groups with 
view permissions: Set(); users with modify permissions: Set(root); groups with 
modify permissions: Set()
19/06/03 09:53:28 INFO Utils: Successfully started service 'sparkDriver' on 
port 7078.
19/06/03 09:53:28 INFO SparkEnv: Registering MapOutputTracker
19/06/03 09:53:28 INFO SparkEnv: Registering BlockManagerMaster
19/06/03 09:53:28 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/06/03 09:53:28 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/06/03 09:53:28 INFO DiskBlockManager: Created local directory at 
/var/data/spark-4d779a23-103a-436d-9491-1abbf2952f45/blockmgr-aa7e1365-4aa5-409e-aaea-d2069e9a7ccf
19/06/03 09:53:28 INFO MemoryStore: MemoryStore started with capacity 3.6 GB
19/06/03 09:53:28 INFO SparkEnv: Registering OutputCommitCoordinator
19/06/03 09:53:29 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
19/06/03 09:53:29 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc:4040
19/06/03 09:53:29 INFO SparkContext: Added file 
oci://code-assets@paasdevsss/pyspark_min.py at 
oci://code-assets@paasdevsss/pyspark_min.py with timestamp 1559555609257
19/06/03 09:53:29 INFO Utils: Fetching 
oci://code-assets@paasdevsss/pyspark_min.py to 
/var/data/spark-4d779a23-103a-436d-9491-1abbf2952f45/spark-64951e10-ce58-4bf8-a38b-6b852a90234f/userFiles-39b90ce9-3b44-4ed3-b1c8-9d0863b3c445/fetchFileTemp8597444061541435463.tmp
19/06/03 09:53:30 INFO ExecutorPodsAllocator: Going to request 1 executors from 
Kubernetes.
19/06/03 09:53:30 INFO Version: HV000001: Hibernate Validator 5.2.4.Final
19/06/03 09:53:30 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
19/06/03 09:53:30 INFO NettyBlockTransferService: Server created on 
spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc:7079
19/06/03 09:53:30 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
19/06/03 09:53:30 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, 
spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc,
 7079, None)
19/06/03 09:53:30 INFO BlockManagerMasterEndpoint: Registering block manager 
spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc:7079
 with 3.6 GB RAM, BlockManagerId(driver, 
spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc,
 7079, None)
19/06/03 09:53:30 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 
spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc,
 7079, None)
19/06/03 09:53:30 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, 
spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc,
 7079, None)
19/06/03 09:53:30 INFO TelemetrySink: Instantiating TelemetrySink
19/06/03 09:53:30 INFO Services: Registering new service: 
Services.BasicService(serviceName=AUTH, serviceEndpointPrefix=auth, 
serviceEndpointTemplate=null)
19/06/03 09:53:31 INFO Region: Loaded service 'AUTH' endpoint mappings: 
{US_PHOENIX_1=https://auth.us-phoenix-1.oraclecloud.com, 
EU_FRANKFURT_1=https://auth.eu-frankfurt-1.oraclecloud.com, 
US_GOV_CHICAGO_1=https://auth.us-gov-chicago-1.oraclegovcloud.com, 
CA_TORONTO_1=https://auth.ca-toronto-1.oraclecloud.com, 
US_ASHBURN_1=https://auth.us-ashburn-1.oraclecloud.com, 
US_LUKE_1=https://auth.us-luke-1.oraclegovcloud.com, 
UK_LONDON_1=https://auth.uk-london-1.oraclecloud.com, 
US_LANGLEY_1=https://auth.us-langley-1.oraclegovcloud.com, 
US_GOV_PHOENIX_1=https://auth.us-gov-phoenix-1.oraclegovcloud.com, 
US_GOV_ASHBURN_1=https://auth.us-gov-ashburn-1.oraclegovcloud.com}
19/06/03 09:53:31 INFO URLBasedX509CertificateSupplier: suppressX509Workaround 
flag set to false
19/06/03 09:53:31 INFO JavaRuntimeUtils: Determined JRE version as Java_8
19/06/03 09:53:31 INFO DefaultConfigurator: Setting connector provider to 
HttpUrlConnectorProvider
19/06/03 09:53:31 INFO OracleHttpClientBuilder: DynamicSslContextProviderConfig 
is not configured. Attempting to use tlsConfig
19/06/03 09:53:32 INFO OverlayHttpClientBuilder: 
DynamicSslContextProviderConfig is not configured. Attempting to use tlsConfig
19/06/03 09:53:32 INFO TelemetrySink: DianogaReporter created and registered 
with metrics
19/06/03 09:53:32 INFO TelemetrySink: ScheduledMetricReporter created
19/06/03 09:53:32 INFO TelemetrySink: Starting ScheduledMetricReporter
19/06/03 09:53:32 INFO SparkContext: Registered listener 
oracle.dfcs.spark.listener.JobListener
19/06/03 09:53:32 INFO JobListener: Thread 70 called onApplicationStart...
19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: Intialize 
SparkUIIngressService using SparkConf...
19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: masterURL - 
https://kubernetes.default.svc:443, nameSpace - 24f2k7cztfza, 
backendServiceName - 
spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc, 
ingressServiceName - 
spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-ingress, runId - 
1b8dd23c-d07c-4f4b-bb01-e33ef0046410
19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: Building 
SparkUIIngressService...
19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: ---
apiVersion: "extensions/v1beta1"
kind: "Ingress"
metadata:
annotations:
nginx.ingress.kubernetes.io/rewrite-target: "/"
nginx.ingress.kubernetes.io/configuration-snippet: "rewrite 
/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/(.*)$\
\ /$1 break;\nproxy_set_header Accept-Encoding \"\";\nsub_filter_types 
text/html\
\ application/javascript;\nsub_filter \"/static/\" 
\"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/static/\"\
;\nsub_filter \"/jobs/\" 
\"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/jobs/\"\
;\nsub_filter \"/stages/\" 
\"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/stages/\"\
;\nsub_filter \"/storage/\" 
\"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/storage/\"\
;\nsub_filter \"/environment/\" 
\"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/environment/\"\
;\nsub_filter \"/executors/\" 
\"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/executors/\"\
;\nsub_filter \"/streaming/\" 
\"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/streaming/\"\
;\nsub_filter \"/SQL/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/SQL/\"\
;\nsub_filter \"/api/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/api/\"\
;\nsub_filter \"</head>\" \"<script 
src='https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/3.6.5/iframeResizer.contentWindow.js'></script></head>\"\
;\nsub_filter_once off;\n"
nginx.ingress.kubernetes.io/proxy-redirect-from: "http://$host/";
nginx.ingress.kubernetes.io/proxy-redirect-to: 
"$scheme://$host/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/"
nginx.ingress.kubernetes.io/ssl-redirect: "false"
finalizers: []
labels:
app: "spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-ingress"
name: "spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-ingress"
namespace: "24f2k7cztfza"
ownerReferences: []
spec:
rules:
- http:
paths:
- backend:
serviceName: "spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc"
servicePort: 4040
path: "/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410"
tls: []

19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: Creating Ingress Service.
19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: Created Ingress Service.
19/06/03 09:53:39 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (10.244.16.52:52800) with ID 1
19/06/03 09:53:39 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is 
ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
19/06/03 09:53:39 INFO BlockManagerMasterEndpoint: Registering block manager 
10.244.16.52:34657 with 3.8 GB RAM, BlockManagerId(1, 10.244.16.52, 34657, None)
19/06/03 09:53:39 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir ('file:/spark-warehouse').
19/06/03 09:53:39 INFO SharedState: Warehouse path is 'file:/spark-warehouse'.
19/06/03 09:53:40 INFO StateStoreCoordinatorRef: Registered 
StateStoreCoordinator endpoint
19/06/03 09:53:40 INFO AsyncEmitter: Worker thread will flush remaining events 
before exiting.
19/06/03 09:53:40 INFO SparkContext: Invoking stop() from shutdown hook
19/06/03 09:53:40 INFO JobListener: Thread 70 called onApplicationEnd...
19/06/03 09:53:40 INFO JobListener: Uploading spark job results to tenant 
results bucket
19/06/03 09:53:40 INFO AsyncEmitter: Queue flush finished successfully within 
timeout.
19/06/03 09:53:40 INFO SparkUI: Stopped Spark web UI at 
http://spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc:4040
19/06/03 09:53:40 INFO BmcFilesystem: Attempting to initialize filesystem with 
URI oci://dataflow-logs@paasdevsss/
19/06/03 09:53:40 INFO BmcFilesystem: Initialized filesystem for namespace 
paasdevsss and bucket dataflow-logs
19/06/03 09:53:40 INFO BmcDataStoreFactory: Using connector version: 2.7.2.2
19/06/03 09:53:40 INFO Version: 
{"prefix":"spark.sss","secondaryPrefixes":[],"properties":{"StartTime":"1559555620222"},"metrics":{"pyspark_min.py.driver.BlockManager.disk.diskSpaceUsed_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.maxMem_MB":"7584.0","pyspark_min.py.driver.BlockManager.memory.maxOffHeapMem_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.maxOnHeapMem_MB":"7584.0","pyspark_min.py.driver.BlockManager.memory.memUsed_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.offHeapMemUsed_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.onHeapMemUsed_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.remainingMem_MB":"7584.0","pyspark_min.py.driver.BlockManager.memory.remainingOffHeapMem_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.remainingOnHeapMem_MB":"7584.0","pyspark_min.py.driver.DAGScheduler.job.activeJobs":"0.0","pyspark_min.py.driver.DAGScheduler.job.allJobs":"0.0","pyspark_min.py.driver.DAGScheduler.stage.failedStages":"0.0","pyspark_min.py.driver.DAGScheduler.stage.runningStages":"0.0","pyspark_min.py.driver.DAGScheduler.stage.waitingStages":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.appStatus.size":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.executorManagement.size":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.shared.size":"0.0","pyspark_min.py.driver.HiveExternalCatalog.fileCacheHits":"0.0","pyspark_min.py.driver.HiveExternalCatalog.filesDiscovered":"0.0","pyspark_min.py.driver.HiveExternalCatalog.hiveClientCalls":"0.0","pyspark_min.py.driver.HiveExternalCatalog.parallelListingJobCount":"0.0","pyspark_min.py.driver.HiveExternalCatalog.partitionsFetched":"0.0","pyspark_min.py.driver.LiveListenerBus.numEventsPosted":"6.0","pyspark_min.py.driver.LiveListenerBus.queue.appStatus.numDroppedEvents":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.executorManagement.numDroppedEvents":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.shared.numDroppedEvents":"0.0","pyspark_min.py.driver.DAGScheduler.messageProcessingTime.rate":"0.015991117074135343","pyspark_min.py.driver.LiveListenerBus.listenerProcessingTime.oracle.dfcs.spark.listener.JobListener.rate":"0.6","pyspark_min.py.driver.LiveListenerBus.listenerProcessingTime.org.apache.spark.HeartbeatReceiver.rate":"0.552026648777594","pyspark_min.py.driver.LiveListenerBus.listenerProcessingTime.org.apache.spark.status.AppStatusListener.rate":"0.552026648777594","pyspark_min.py.driver.LiveListenerBus.queue.appStatus.listenerProcessingTime.rate":"0.552026648777594","pyspark_min.py.driver.LiveListenerBus.queue.executorManagement.listenerProcessingTime.rate":"0.552026648777594","pyspark_min.py.driver.LiveListenerBus.queue.shared.listenerProcessingTime.rate":"0.6"}}
19/06/03 09:53:40 INFO KubernetesClusterSchedulerBackend: Shutting down all 
executors
19/06/03 09:53:40 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
executor to shut down
19/06/03 09:53:40 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
19/06/03 09:53:40 INFO DefaultConfigurator: Setting connector provider to 
HttpUrlConnectorProvider
19/06/03 09:53:40 INFO ObjectStorageClient: Setting endpoint to 
https://objectstorage.us-phoenix-1.oraclecloud.com
19/06/03 09:53:40 INFO BmcDataStoreFactory: Using endpoint 
https://objectstorage.us-phoenix-1.oraclecloud.com
19/06/03 09:53:40 INFO ObjectStorageClient: Setting endpoint to 
https://objectstorage.us-phoenix-1.oraclecloud.com
19/06/03 09:53:40 INFO BmcDataStore: Using upload configuration: %s
19/06/03 09:53:40 INFO BmcFilesystem: Setting working directory to 
oci://dataflow-logs@paasdevsss/user/root, and initialized uri to 
oci://dataflow-logs@paasdevsss
19/06/03 09:53:40 INFO X509FederationClient: Refreshing session keys.
19/06/03 09:53:40 INFO X509FederationClient: Getting security token from the 
auth server
JobListener: Uploading file /logs/sparkjob/spark_stdout.log to 
/83f969ca-083e-46b6-89a7-1f1d54f85167/spark_stdout.log
RequestBuilder: No Progressable passed, not reporting progress.
JobListener: Uploading file /logs/sparkjob/spark_stderr.log to 
/83f969ca-083e-46b6-89a7-1f1d54f85167/spark_stderr.log
RequestBuilder: No Progressable passed, not reporting progress.
SparkUIIngressServiceBuilder: Deleting Ingress Service.
SparkUIIngressServiceBuilder: Deleted Ingress Service.
MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
MemoryStore: MemoryStore cleared
BlockManager: BlockManager stopped
BlockManagerMaster: BlockManagerMaster stopped
TelemetrySink: Stopping ScheduledMetricReporter
TelemetrySink: Stopping metric emitter
OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: 
OutputCommitCoordinator stopped!
SparkContext: Successfully stopped SparkContext
ShutdownHookManager: Shutdown hook called
ShutdownHookManager: Deleting directory 
/var/data/spark-4d779a23-103a-436d-9491-1abbf2952f45/spark-64951e10-ce58-4bf8-a38b-6b852a90234f/pyspark-64669e01-2a2c-4300-818d-6947caf0b053
ShutdownHookManager: Deleting directory 
/var/data/spark-4d779a23-103a-436d-9491-1abbf2952f45/spark-64951e10-ce58-4bf8-a38b-6b852a90234f
ShutdownHookManager: Deleting directory 
/tmp/spark-9532d74f-ab0f-48c2-a39f-363a77aeaa9e
{noformat}

> driver pod hangs with pyspark 2.4.3 and master on kubenetes
> -----------------------------------------------------------
>
>                 Key: SPARK-27927
>                 URL: https://issues.apache.org/jira/browse/SPARK-27927
>             Project: Spark
>          Issue Type: Bug
>          Components: Kubernetes
>    Affects Versions: 3.0.0, 2.4.3
>         Environment: k8s 1.11.9
> spark 2.4.3 and master branch.
>            Reporter: Edwin Biemond
>            Priority: Major
>
> When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs 
> and never calls the shutdown hook. 
> {code:java}
> #!/usr/bin/env python
> from __future__ import print_function
> import os
> import os.path
> import sys
> # Are we really in Spark?
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName('hello_world').getOrCreate()
> print('Our Spark version is {}'.format(spark.version))
> print('Spark context information: {} parallelism={} python version={}'.format(
> str(spark.sparkContext),
> spark.sparkContext.defaultParallelism,
> spark.sparkContext.pythonVer
> ))
> {code}
> When we run this on kubernetes the driver and executer are just hanging. We 
> see the output of this python script. 
> {noformat}
> bash-4.2# cat stdout.log
> Our Spark version is 2.4.3
> Spark context information: <SparkContext 
> master=k8s://https://kubernetes.default.svc:443 appName=hello_world> 
> parallelism=2 python version=3.6{noformat}
> What works
>  * a simple python with a print works fine on 2.4.3 and 3.0.0
>  * same setup on 2.4.0
>  * 2.4.3 spark-submit with the above pyspark
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to