prakash gurung created SPARK-46128: -------------------------------------- Summary: External scheduler cannot be instantiated Key: SPARK-46128 URL: https://issues.apache.org/jira/browse/SPARK-46128 Project: Spark Issue Type: Bug Components: Kubernetes, Spark Core, Spark Submit Affects Versions: 3.5.0, 3.1.2 Reporter: prakash gurung
Spark submit driver fails to resolve "kubernetes.default.svc" when trying to create executors. Spark versions tried: * 3.5.0 * 3.1.2 Kubernetes cluster on premises using kubeadm * Kubernetes version: v1.28.2 * OS: Ubuntu 22.04.1 (Jammy) * Container Runtime: 1.6.24 Complete error : {code:java} + shift 1+ CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.48.131.135 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jarWARNING: An illegal reflective access operation has occurredWARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/opt/spark/jars/spark-unsafe_2.12-3.1.2.jar) to constructor java.nio.DirectByteBuffer(long,int)WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.PlatformWARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operationsWARNING: All illegal access operations will be denied in a future release23/11/22 03:27:20 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableUsing Spark's default log4j profile: org/apache/spark/log4j-defaults.properties23/11/22 03:27:20 INFO SparkContext: Running Spark version 3.1.223/11/22 03:27:20 INFO ResourceUtils: ==============================================================23/11/22 03:27:20 INFO ResourceUtils: No custom resources configured for spark.driver.23/11/22 03:27:20 INFO ResourceUtils: ==============================================================23/11/22 03:27:20 INFO SparkContext: Submitted application: Spark Pi23/11/22 03:27:20 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 1024, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0)23/11/22 03:27:20 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor23/11/22 03:27:20 INFO ResourceProfileManager: Added ResourceProfile id: 023/11/22 03:27:20 INFO SecurityManager: Changing view acls to: 185,root23/11/22 03:27:20 INFO SecurityManager: Changing modify acls to: 185,root23/11/22 03:27:20 INFO SecurityManager: Changing view acls groups to:23/11/22 03:27:20 INFO SecurityManager: Changing modify acls groups to:23/11/22 03:27:20 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(185, root); groups with view permissions: Set(); users with modify permissions: Set(185, root); groups with modify permissions: Set()23/11/22 03:27:20 INFO Utils: Successfully started service 'sparkDriver' on port 7078.23/11/22 03:27:20 INFO SparkEnv: Registering MapOutputTracker23/11/22 03:27:20 INFO SparkEnv: Registering BlockManagerMaster23/11/22 03:27:20 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information23/11/22 03:27:20 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up23/11/22 03:27:20 INFO SparkEnv: Registering BlockManagerMasterHeartbeat23/11/22 03:27:20 INFO DiskBlockManager: Created local directory at /var/data/spark-9239c605-130e-4feb-b050-a33546d330bb/blockmgr-dd78ca51-ba55-4da9-82e3-6d4f17b6975323/11/22 03:27:20 INFO MemoryStore: MemoryStore started with capacity 413.9 MiB23/11/22 03:27:20 INFO SparkEnv: Registering OutputCommitCoordinator23/11/22 03:27:20 INFO Utils: Successfully started service 'SparkUI' on port 4040.23/11/22 03:27:20 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at http://spark-pi-d465538bf5115d50-driver-svc.default.svc:404023/11/22 03:27:20 INFO SparkContext: Added JAR local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar at file:/opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar with timestamp 170062364024623/11/22 03:27:20 WARN SparkContext: The jar local:///opt/spark/examples/jars/spark-examples_2.12-3.1.2.jar has been added already. Overwriting of added jars is not supported in the current version.23/11/22 03:27:20 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file23/11/22 03:27:41 ERROR SparkContext: Error initializing SparkContext.org.apache.spark.SparkException: External scheduler cannot be instantiated at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2961) at org.apache.spark.SparkContext.<init>(SparkContext.scala:557) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2672) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:945) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [spark-pi-d465538bf5115d50-driver] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:186) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:84) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:75) at scala.Option.map(Option.scala:230) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:74) at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:123) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2955) ... 19 moreCaused by: java.net.UnknownHostException: kubernetes.default.svc: Temporary failure in name resolution at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) at java.base/java.net.InetAddress.getAddressesFromNameService(Unknown Source) at java.base/java.net.InetAddress$NameServiceAddresses.get(Unknown Source) at java.base/java.net.InetAddress.getAllByName0(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at okhttp3.Dns$1.lookup(Dns.java:40) at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:185) at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:149) at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:84) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:215) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:135) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.OIDCTokenRefreshInterceptor.intercept(OIDCTokenRefreshInterceptor.java:41) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:151) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257) at okhttp3.RealCall.execute(RealCall.java:93) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:490) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:416) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:397) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:933) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:220) ... 26 more23/11/22 03:27:41 INFO SparkUI: Stopped Spark web UI at http://spark-pi-d465538bf5115d50-driver-svc.default.svc:404023/11/22 03:27:41 INFO MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!23/11/22 03:27:41 INFO MemoryStore: MemoryStore cleared23/11/22 03:27:41 INFO BlockManager: BlockManager stopped23/11/22 03:27:41 INFO BlockManagerMaster: BlockManagerMaster stopped23/11/22 03:27:41 WARN MetricsSystem: Stopping a MetricsSystem that is not running23/11/22 03:27:41 INFO OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!23/11/22 03:27:41 INFO SparkContext: Successfully stopped SparkContextException in thread "main" org.apache.spark.SparkException: External scheduler cannot be instantiated at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2961) at org.apache.spark.SparkContext.<init>(SparkContext.scala:557) at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2672) at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:945) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:939) at org.apache.spark.examples.SparkPi$.main(SparkPi.scala:30) at org.apache.spark.examples.SparkPi.main(SparkPi.scala) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:180) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1039) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1048) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)Caused by: io.fabric8.kubernetes.client.KubernetesClientException: Operation: [get] for kind: [Pod] with name: [spark-pi-d465538bf5115d50-driver] in namespace: [default] failed. at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:64) at io.fabric8.kubernetes.client.KubernetesClientException.launderThrowable(KubernetesClientException.java:72) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:225) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:186) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.get(BaseOperation.java:84) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$driverPod$1(ExecutorPodsAllocator.scala:75) at scala.Option.map(Option.scala:230) at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.<init>(ExecutorPodsAllocator.scala:74) at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterManager.createSchedulerBackend(KubernetesClusterManager.scala:123) at org.apache.spark.SparkContext$.org$apache$spark$SparkContext$$createTaskScheduler(SparkContext.scala:2955) ... 19 moreCaused by: java.net.UnknownHostException: kubernetes.default.svc: Temporary failure in name resolution at java.base/java.net.Inet6AddressImpl.lookupAllHostAddr(Native Method) at java.base/java.net.InetAddress$PlatformNameService.lookupAllHostAddr(Unknown Source) at java.base/java.net.InetAddress.getAddressesFromNameService(Unknown Source) at java.base/java.net.InetAddress$NameServiceAddresses.get(Unknown Source) at java.base/java.net.InetAddress.getAllByName0(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at java.base/java.net.InetAddress.getAllByName(Unknown Source) at okhttp3.Dns$1.lookup(Dns.java:40) at okhttp3.internal.connection.RouteSelector.resetNextInetSocketAddress(RouteSelector.java:185) at okhttp3.internal.connection.RouteSelector.nextProxy(RouteSelector.java:149) at okhttp3.internal.connection.RouteSelector.next(RouteSelector.java:84) at okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:215) at okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135) at okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114) at okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:127) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:135) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.OIDCTokenRefreshInterceptor.intercept(OIDCTokenRefreshInterceptor.java:41) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:151) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147) at okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121) at okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:257) at okhttp3.RealCall.execute(RealCall.java:93) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:490) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:451) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:416) at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleGet(OperationSupport.java:397) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleGet(BaseOperation.java:933) at io.fabric8.kubernetes.client.dsl.base.BaseOperation.getMandatory(BaseOperation.java:220) ... 26 more23/11/22 03:27:41 INFO ShutdownHookManager: Shutdown hook called23/11/22 03:27:41 INFO ShutdownHookManager: Deleting directory /tmp/spark-2f306210-bd49-47ad-a12b-db283e4ca6fd23/11/22 03:27:41 INFO ShutdownHookManager: Deleting directory /var/data/spark-9239c605-130e-4feb-b050-a33546d330bb/spark-8840557b-371c-413e-a29c-a1e8f2ec748a {code} Similar issue: https://issues.apache.org/jira/browse/SPARK-29640 -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org