[ https://issues.apache.org/jira/browse/SPARK-27927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16854471#comment-16854471 ]
Edwin Biemond commented on SPARK-27927: --------------------------------------- same but then for 2.4.0 , in this case the shutdown hook is called on the driver. {noformat} Our Spark version is 2.4.0 Spark context information: <SparkContext master=k8s://https://kubernetes.default.svc:443 appName=hello_world> parallelism=2 python version=3.6 19/06/03 09:53:23 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 19/06/03 09:53:28 INFO SparkContext: Running Spark version 2.4.0 19/06/03 09:53:28 INFO SparkContext: Submitted application: hello_world 19/06/03 09:53:28 INFO SecurityManager: Changing view acls to: root 19/06/03 09:53:28 INFO SecurityManager: Changing modify acls to: root 19/06/03 09:53:28 INFO SecurityManager: Changing view acls groups to: 19/06/03 09:53:28 INFO SecurityManager: Changing modify acls groups to: 19/06/03 09:53:28 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set() 19/06/03 09:53:28 INFO Utils: Successfully started service 'sparkDriver' on port 7078. 19/06/03 09:53:28 INFO SparkEnv: Registering MapOutputTracker 19/06/03 09:53:28 INFO SparkEnv: Registering BlockManagerMaster 19/06/03 09:53:28 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 19/06/03 09:53:28 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 19/06/03 09:53:28 INFO DiskBlockManager: Created local directory at /var/data/spark-4d779a23-103a-436d-9491-1abbf2952f45/blockmgr-aa7e1365-4aa5-409e-aaea-d2069e9a7ccf 19/06/03 09:53:28 INFO MemoryStore: MemoryStore started with capacity 3.6 GB 19/06/03 09:53:28 INFO SparkEnv: Registering OutputCommitCoordinator 19/06/03 09:53:29 INFO Utils: Successfully started service 'SparkUI' on port 4040. 19/06/03 09:53:29 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc:4040 19/06/03 09:53:29 INFO SparkContext: Added file oci://code-assets@paasdevsss/pyspark_min.py at oci://code-assets@paasdevsss/pyspark_min.py with timestamp 1559555609257 19/06/03 09:53:29 INFO Utils: Fetching oci://code-assets@paasdevsss/pyspark_min.py to /var/data/spark-4d779a23-103a-436d-9491-1abbf2952f45/spark-64951e10-ce58-4bf8-a38b-6b852a90234f/userFiles-39b90ce9-3b44-4ed3-b1c8-9d0863b3c445/fetchFileTemp8597444061541435463.tmp 19/06/03 09:53:30 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes. 19/06/03 09:53:30 INFO Version: HV000001: Hibernate Validator 5.2.4.Final 19/06/03 09:53:30 INFO Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079. 19/06/03 09:53:30 INFO NettyBlockTransferService: Server created on spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc:7079 19/06/03 09:53:30 INFO BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy 19/06/03 09:53:30 INFO BlockManagerMaster: Registering BlockManager BlockManagerId(driver, spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc, 7079, None) 19/06/03 09:53:30 INFO BlockManagerMasterEndpoint: Registering block manager spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc:7079 with 3.6 GB RAM, BlockManagerId(driver, spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc, 7079, None) 19/06/03 09:53:30 INFO BlockManagerMaster: Registered BlockManager BlockManagerId(driver, spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc, 7079, None) 19/06/03 09:53:30 INFO BlockManager: Initialized BlockManager: BlockManagerId(driver, spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc, 7079, None) 19/06/03 09:53:30 INFO TelemetrySink: Instantiating TelemetrySink 19/06/03 09:53:30 INFO Services: Registering new service: Services.BasicService(serviceName=AUTH, serviceEndpointPrefix=auth, serviceEndpointTemplate=null) 19/06/03 09:53:31 INFO Region: Loaded service 'AUTH' endpoint mappings: {US_PHOENIX_1=https://auth.us-phoenix-1.oraclecloud.com, EU_FRANKFURT_1=https://auth.eu-frankfurt-1.oraclecloud.com, US_GOV_CHICAGO_1=https://auth.us-gov-chicago-1.oraclegovcloud.com, CA_TORONTO_1=https://auth.ca-toronto-1.oraclecloud.com, US_ASHBURN_1=https://auth.us-ashburn-1.oraclecloud.com, US_LUKE_1=https://auth.us-luke-1.oraclegovcloud.com, UK_LONDON_1=https://auth.uk-london-1.oraclecloud.com, US_LANGLEY_1=https://auth.us-langley-1.oraclegovcloud.com, US_GOV_PHOENIX_1=https://auth.us-gov-phoenix-1.oraclegovcloud.com, US_GOV_ASHBURN_1=https://auth.us-gov-ashburn-1.oraclegovcloud.com} 19/06/03 09:53:31 INFO URLBasedX509CertificateSupplier: suppressX509Workaround flag set to false 19/06/03 09:53:31 INFO JavaRuntimeUtils: Determined JRE version as Java_8 19/06/03 09:53:31 INFO DefaultConfigurator: Setting connector provider to HttpUrlConnectorProvider 19/06/03 09:53:31 INFO OracleHttpClientBuilder: DynamicSslContextProviderConfig is not configured. Attempting to use tlsConfig 19/06/03 09:53:32 INFO OverlayHttpClientBuilder: DynamicSslContextProviderConfig is not configured. Attempting to use tlsConfig 19/06/03 09:53:32 INFO TelemetrySink: DianogaReporter created and registered with metrics 19/06/03 09:53:32 INFO TelemetrySink: ScheduledMetricReporter created 19/06/03 09:53:32 INFO TelemetrySink: Starting ScheduledMetricReporter 19/06/03 09:53:32 INFO SparkContext: Registered listener oracle.dfcs.spark.listener.JobListener 19/06/03 09:53:32 INFO JobListener: Thread 70 called onApplicationStart... 19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: Intialize SparkUIIngressService using SparkConf... 19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: masterURL - https://kubernetes.default.svc:443, nameSpace - 24f2k7cztfza, backendServiceName - spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc, ingressServiceName - spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-ingress, runId - 1b8dd23c-d07c-4f4b-bb01-e33ef0046410 19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: Building SparkUIIngressService... 19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: --- apiVersion: "extensions/v1beta1" kind: "Ingress" metadata: annotations: nginx.ingress.kubernetes.io/rewrite-target: "/" nginx.ingress.kubernetes.io/configuration-snippet: "rewrite /sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/(.*)$\ \ /$1 break;\nproxy_set_header Accept-Encoding \"\";\nsub_filter_types text/html\ \ application/javascript;\nsub_filter \"/static/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/static/\"\ ;\nsub_filter \"/jobs/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/jobs/\"\ ;\nsub_filter \"/stages/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/stages/\"\ ;\nsub_filter \"/storage/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/storage/\"\ ;\nsub_filter \"/environment/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/environment/\"\ ;\nsub_filter \"/executors/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/executors/\"\ ;\nsub_filter \"/streaming/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/streaming/\"\ ;\nsub_filter \"/SQL/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/SQL/\"\ ;\nsub_filter \"/api/\" \"/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/api/\"\ ;\nsub_filter \"</head>\" \"<script src='https://cdnjs.cloudflare.com/ajax/libs/iframe-resizer/3.6.5/iframeResizer.contentWindow.js'></script></head>\"\ ;\nsub_filter_once off;\n" nginx.ingress.kubernetes.io/proxy-redirect-from: "http://$host/" nginx.ingress.kubernetes.io/proxy-redirect-to: "$scheme://$host/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410/" nginx.ingress.kubernetes.io/ssl-redirect: "false" finalizers: [] labels: app: "spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-ingress" name: "spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-ingress" namespace: "24f2k7cztfza" ownerReferences: [] spec: rules: - http: paths: - backend: serviceName: "spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc" servicePort: 4040 path: "/sparkui/1b8dd23c-d07c-4f4b-bb01-e33ef0046410" tls: [] 19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: Creating Ingress Service. 19/06/03 09:53:32 INFO SparkUIIngressServiceBuilder: Created Ingress Service. 19/06/03 09:53:39 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor NettyRpcEndpointRef(spark-client://Executor) (10.244.16.52:52800) with ID 1 19/06/03 09:53:39 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8 19/06/03 09:53:39 INFO BlockManagerMasterEndpoint: Registering block manager 10.244.16.52:34657 with 3.8 GB RAM, BlockManagerId(1, 10.244.16.52, 34657, None) 19/06/03 09:53:39 INFO SharedState: Setting hive.metastore.warehouse.dir ('null') to the value of spark.sql.warehouse.dir ('file:/spark-warehouse'). 19/06/03 09:53:39 INFO SharedState: Warehouse path is 'file:/spark-warehouse'. 19/06/03 09:53:40 INFO StateStoreCoordinatorRef: Registered StateStoreCoordinator endpoint 19/06/03 09:53:40 INFO AsyncEmitter: Worker thread will flush remaining events before exiting. 19/06/03 09:53:40 INFO SparkContext: Invoking stop() from shutdown hook 19/06/03 09:53:40 INFO JobListener: Thread 70 called onApplicationEnd... 19/06/03 09:53:40 INFO JobListener: Uploading spark job results to tenant results bucket 19/06/03 09:53:40 INFO AsyncEmitter: Queue flush finished successfully within timeout. 19/06/03 09:53:40 INFO SparkUI: Stopped Spark web UI at http://spark-1b8dd23cd07c4f4bbb01e33ef0046410-1559555595911-driver-svc.24f2k7cztfza.svc:4040 19/06/03 09:53:40 INFO BmcFilesystem: Attempting to initialize filesystem with URI oci://dataflow-logs@paasdevsss/ 19/06/03 09:53:40 INFO BmcFilesystem: Initialized filesystem for namespace paasdevsss and bucket dataflow-logs 19/06/03 09:53:40 INFO BmcDataStoreFactory: Using connector version: 2.7.2.2 19/06/03 09:53:40 INFO Version: {"prefix":"spark.sss","secondaryPrefixes":[],"properties":{"StartTime":"1559555620222"},"metrics":{"pyspark_min.py.driver.BlockManager.disk.diskSpaceUsed_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.maxMem_MB":"7584.0","pyspark_min.py.driver.BlockManager.memory.maxOffHeapMem_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.maxOnHeapMem_MB":"7584.0","pyspark_min.py.driver.BlockManager.memory.memUsed_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.offHeapMemUsed_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.onHeapMemUsed_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.remainingMem_MB":"7584.0","pyspark_min.py.driver.BlockManager.memory.remainingOffHeapMem_MB":"0.0","pyspark_min.py.driver.BlockManager.memory.remainingOnHeapMem_MB":"7584.0","pyspark_min.py.driver.DAGScheduler.job.activeJobs":"0.0","pyspark_min.py.driver.DAGScheduler.job.allJobs":"0.0","pyspark_min.py.driver.DAGScheduler.stage.failedStages":"0.0","pyspark_min.py.driver.DAGScheduler.stage.runningStages":"0.0","pyspark_min.py.driver.DAGScheduler.stage.waitingStages":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.appStatus.size":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.executorManagement.size":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.shared.size":"0.0","pyspark_min.py.driver.HiveExternalCatalog.fileCacheHits":"0.0","pyspark_min.py.driver.HiveExternalCatalog.filesDiscovered":"0.0","pyspark_min.py.driver.HiveExternalCatalog.hiveClientCalls":"0.0","pyspark_min.py.driver.HiveExternalCatalog.parallelListingJobCount":"0.0","pyspark_min.py.driver.HiveExternalCatalog.partitionsFetched":"0.0","pyspark_min.py.driver.LiveListenerBus.numEventsPosted":"6.0","pyspark_min.py.driver.LiveListenerBus.queue.appStatus.numDroppedEvents":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.executorManagement.numDroppedEvents":"0.0","pyspark_min.py.driver.LiveListenerBus.queue.shared.numDroppedEvents":"0.0","pyspark_min.py.driver.DAGScheduler.messageProcessingTime.rate":"0.015991117074135343","pyspark_min.py.driver.LiveListenerBus.listenerProcessingTime.oracle.dfcs.spark.listener.JobListener.rate":"0.6","pyspark_min.py.driver.LiveListenerBus.listenerProcessingTime.org.apache.spark.HeartbeatReceiver.rate":"0.552026648777594","pyspark_min.py.driver.LiveListenerBus.listenerProcessingTime.org.apache.spark.status.AppStatusListener.rate":"0.552026648777594","pyspark_min.py.driver.LiveListenerBus.queue.appStatus.listenerProcessingTime.rate":"0.552026648777594","pyspark_min.py.driver.LiveListenerBus.queue.executorManagement.listenerProcessingTime.rate":"0.552026648777594","pyspark_min.py.driver.LiveListenerBus.queue.shared.listenerProcessingTime.rate":"0.6"}} 19/06/03 09:53:40 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 19/06/03 09:53:40 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 19/06/03 09:53:40 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed (this is expected if the application is shutting down.) 19/06/03 09:53:40 INFO DefaultConfigurator: Setting connector provider to HttpUrlConnectorProvider 19/06/03 09:53:40 INFO ObjectStorageClient: Setting endpoint to https://objectstorage.us-phoenix-1.oraclecloud.com 19/06/03 09:53:40 INFO BmcDataStoreFactory: Using endpoint https://objectstorage.us-phoenix-1.oraclecloud.com 19/06/03 09:53:40 INFO ObjectStorageClient: Setting endpoint to https://objectstorage.us-phoenix-1.oraclecloud.com 19/06/03 09:53:40 INFO BmcDataStore: Using upload configuration: %s 19/06/03 09:53:40 INFO BmcFilesystem: Setting working directory to oci://dataflow-logs@paasdevsss/user/root, and initialized uri to oci://dataflow-logs@paasdevsss 19/06/03 09:53:40 INFO X509FederationClient: Refreshing session keys. 19/06/03 09:53:40 INFO X509FederationClient: Getting security token from the auth server JobListener: Uploading file /logs/sparkjob/spark_stdout.log to /83f969ca-083e-46b6-89a7-1f1d54f85167/spark_stdout.log RequestBuilder: No Progressable passed, not reporting progress. JobListener: Uploading file /logs/sparkjob/spark_stderr.log to /83f969ca-083e-46b6-89a7-1f1d54f85167/spark_stderr.log RequestBuilder: No Progressable passed, not reporting progress. SparkUIIngressServiceBuilder: Deleting Ingress Service. SparkUIIngressServiceBuilder: Deleted Ingress Service. MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped! MemoryStore: MemoryStore cleared BlockManager: BlockManager stopped BlockManagerMaster: BlockManagerMaster stopped TelemetrySink: Stopping ScheduledMetricReporter TelemetrySink: Stopping metric emitter OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped! SparkContext: Successfully stopped SparkContext ShutdownHookManager: Shutdown hook called ShutdownHookManager: Deleting directory /var/data/spark-4d779a23-103a-436d-9491-1abbf2952f45/spark-64951e10-ce58-4bf8-a38b-6b852a90234f/pyspark-64669e01-2a2c-4300-818d-6947caf0b053 ShutdownHookManager: Deleting directory /var/data/spark-4d779a23-103a-436d-9491-1abbf2952f45/spark-64951e10-ce58-4bf8-a38b-6b852a90234f ShutdownHookManager: Deleting directory /tmp/spark-9532d74f-ab0f-48c2-a39f-363a77aeaa9e {noformat} > driver pod hangs with pyspark 2.4.3 and master on kubenetes > ----------------------------------------------------------- > > Key: SPARK-27927 > URL: https://issues.apache.org/jira/browse/SPARK-27927 > Project: Spark > Issue Type: Bug > Components: Kubernetes > Affects Versions: 3.0.0, 2.4.3 > Environment: k8s 1.11.9 > spark 2.4.3 and master branch. > Reporter: Edwin Biemond > Priority: Major > > When we run a simple pyspark on spark 2.4.3 or 3.0.0 the driver pods hangs > and never calls the shutdown hook. > {code:java} > #!/usr/bin/env python > from __future__ import print_function > import os > import os.path > import sys > # Are we really in Spark? > from pyspark.sql import SparkSession > spark = SparkSession.builder.appName('hello_world').getOrCreate() > print('Our Spark version is {}'.format(spark.version)) > print('Spark context information: {} parallelism={} python version={}'.format( > str(spark.sparkContext), > spark.sparkContext.defaultParallelism, > spark.sparkContext.pythonVer > )) > {code} > When we run this on kubernetes the driver and executer are just hanging. We > see the output of this python script. > {noformat} > bash-4.2# cat stdout.log > Our Spark version is 2.4.3 > Spark context information: <SparkContext > master=k8s://https://kubernetes.default.svc:443 appName=hello_world> > parallelism=2 python version=3.6{noformat} > What works > * a simple python with a print works fine on 2.4.3 and 3.0.0 > * same setup on 2.4.0 > * 2.4.3 spark-submit with the above pyspark > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org