khwj opened a new issue, #4942: URL: https://github.com/apache/kyuubi/issues/4942
### Code of Conduct - [X] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [X] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues. ### Describe the bug The default Kubernetes driver pod name set by [EngineRef.scala](https://github.com/apache/kyuubi/blob/9ff46a3c633534c2266ad8e6316b9fddaa024a6c/kyuubi-server/src/main/scala/org/apache/kyuubi/engine/EngineRef.scala#LL129C32-L129C32) is longer than the maximum allowed length of 63 characters. This poses a problem as the driver pod name is subsequently used as a label in Spark executor pods, resulting in invalid label errors. To mitigate this issue, I have resorted to configuring the `spark.app.name` to a shorter value. However, this workaround hampers our ability to identify specific Spark apps based on session, user, or group (Kyuubi currently does not set the Spark user or group as Kubernetes labels). ### Affects Version(s) 1.7.1 ### Kyuubi Server Log Output _No response_ ### Kyuubi Engine Log Output ```logtalk ++ id -u + myuid=999 ++ id -g + mygid=1000 + set +e ++ getent passwd 999 + uidentry=hadoop:x:999:1000::/home/hadoop:/bin/bash + set -e + '[' -z hadoop:x:999:1000::/home/hadoop:/bin/bash ']' + '[' -n '' ']' + SPARK_K8S_CMD=driver + [[ driver == executor ]] + SPARK_CLASSPATH=':/usr/lib/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -z ']' + '[' -z ']' + '[' -n '' ']' + '[' -z x ']' + SPARK_CLASSPATH='/etc/hadoop/conf::/usr/lib/spark/jars/*' + '[' -z x ']' + SPARK_CLASSPATH='/usr/lib/spark/conf:/etc/hadoop/conf::/usr/lib/spark/jars/*' + '[' -n '' ']' + case "$SPARK_K8S_CMD" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + DISABLE_STDOUT_STDERR=0 + '[' -z '' ']' + DISABLE_STDOUT_STDERR=1 + DISABLE_PULLING_CONTAINER_FAILURE=0 + '[' -z '' ']' + DISABLE_PULLING_CONTAINER_FAILURE=1 + '[' -n '' ']' + '[' -n '' ']' + '[' -n '' ']' ++ dirname '' ++ dirname '' + mkdir -p . . + '[' -n '' ']' + (( 1 )) + (( DISABLE_PULLING_CONTAINER_FAILURE )) + exec /usr/bin/tini -s -- /usr/lib/spark/bin/spark-submit --conf spark.driver.bindAddress=10.177.40.182 --deploy-mode client --proxy-user khwunchai --properties-file /usr/lib/spark/conf/spark.properties --class org.apache.kyuubi.engine.spark.SparkSQLEngine spark-internal OpenJDK 64-Bit Server VM warning: If the number of processors is expected to increase from one, then you should configure the number of parallel GC threads appropriately using -XX:ParallelGCThreads=N 23/06/08 16:25:12 WARN HadoopFileSystemOwner: found no group information for khwunchai (auth:PROXY) via hadoop (auth:SIMPLE), using khwunchai as primary group 23/06/08 16:25:12 WARN HadoopFileSystemOwner: found no group information for khwunchai (auth:PROXY) via hadoop (auth:SIMPLE), using khwunchai as primary group 23/06/08 16:25:12 WARN HadoopFileSystemOwner: found no group information for khwunchai (auth:PROXY) via hadoop (auth:SIMPLE), using khwunchai as primary group 23/06/08 16:25:13 INFO SignalRegister: Registering signal handler for TERM 23/06/08 16:25:13 INFO SignalRegister: Registering signal handler for HUP 23/06/08 16:25:13 INFO SignalRegister: Registering signal handler for INT 23/06/08 16:25:13 INFO HiveConf: Found configuration file file:/etc/spark/conf/hive-site.xml 23/06/08 16:25:13 INFO SparkContext: Running Spark version 3.3.1-amzn-0 23/06/08 16:25:13 INFO ResourceUtils: ============================================================== 23/06/08 16:25:13 INFO ResourceUtils: No custom resources configured for spark.driver. 23/06/08 16:25:13 INFO ResourceUtils: ============================================================== 23/06/08 16:25:13 INFO SparkContext: Submitted application: kyuubi_USER_SPARK_SQL_khwunchai_default_73bce6a4-df00-403e-bc5d-d1721e515f9d 23/06/08 16:25:13 INFO ResourceProfile: Default ResourceProfile created, executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , memory -> name: memory, amount: 7200, script: , vendor: , offHeap -> name: offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: cpus, amount: 1.0) 23/06/08 16:25:13 INFO ResourceProfile: Limiting resource is cpus at 1 tasks per executor 23/06/08 16:25:13 INFO ResourceProfileManager: Added ResourceProfile id: 0 23/06/08 16:25:13 INFO SecurityManager: Changing view acls to: hadoop,khwunchai 23/06/08 16:25:13 INFO SecurityManager: Changing modify acls to: hadoop,khwunchai 23/06/08 16:25:13 INFO SecurityManager: Changing view acls groups to: 23/06/08 16:25:13 INFO SecurityManager: Changing modify acls groups to: 23/06/08 16:25:13 INFO SecurityManager: SecurityManager: authentication enabled; ui acls disabled; users with view permissions: Set(hadoop, khwunchai); groups with view permissions: Set(); users with modify permissions: Set(hadoop, khwunchai); groups with modify permissions: Set() 23/06/08 16:25:14 INFO Utils: Successfully started service 'sparkDriver' on port 7078. 23/06/08 16:25:14 INFO SparkEnv: Registering MapOutputTracker 23/06/08 16:25:14 INFO SparkEnv: Registering BlockManagerMaster 23/06/08 16:25:14 INFO BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 23/06/08 16:25:14 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up 23/06/08 16:25:14 INFO SparkEnv: Registering BlockManagerMasterHeartbeat 23/06/08 16:25:14 INFO DiskBlockManager: Created local directory at /var/data/spark-fce5fc27-0a38-451f-b83f-e3712babead1/blockmgr-30c92157-0852-4216-baa8-2b7964aed441 23/06/08 16:25:14 INFO MemoryStore: MemoryStore started with capacity 1740.0 MiB 23/06/08 16:25:14 INFO SparkEnv: Registering OutputCommitCoordinator 23/06/08 16:25:14 INFO SubResultCacheManager: Sub-result caches are disabled. 23/06/08 16:25:14 INFO Utils: Successfully started service 'SparkUI' on port 4040. 23/06/08 16:25:15 INFO SparkContext: Added JAR file:/tmp/spark-b3f39e11-1a74-40f7-a84b-273d5a2ad361/kyuubi-spark-sql-engine_2.12-1.7.1.jar at spark://spark-187df6889bd35db6-driver-svc.spark-apps.svc:7078/jars/kyuubi-spark-sql-engine_2.12-1.7.1.jar with timestamp 1686241513713 23/06/08 16:25:15 INFO SparkContext: Added JAR local:///usr/share/aws/delta/lib/delta-core.jar at file:/usr/share/aws/delta/lib/delta-core.jar with timestamp 1686241513713 23/06/08 16:25:15 INFO SparkContext: Added JAR local:///usr/share/aws/delta/lib/delta-storage.jar at file:/usr/share/aws/delta/lib/delta-storage.jar with timestamp 1686241513713 23/06/08 16:25:15 INFO SparkKubernetesClientFactory: Auto-configuring K8S client using current context from users K8S config file 23/06/08 16:25:16 INFO KubernetesClientUtils: Skip updating the Pod Labels, as the Label eks-subscription.amazonaws.com/emr.internal.id is already present. 23/06/08 16:25:16 INFO Utils: Using initial executors = 1, max of spark.dynamicAllocation.initialExecutors, spark.dynamicAllocation.minExecutors and spark.executor.instances 23/06/08 16:25:16 WARN FairSchedulableBuilder: Fair Scheduler configuration file not found so jobs will be scheduled in FIFO order. To use fair scheduling, configure pools in fairscheduler.xml or set spark.scheduler.allocation.file to a file that contains the configuration. 23/06/08 16:25:16 INFO FairSchedulableBuilder: Created default pool: default, schedulingMode: FIFO, minShare: 0, weight: 1 23/06/08 16:25:16 INFO ExecutorPodsAllocator: Going to request 1 executors from Kubernetes for ResourceProfile Id: 0, target: 1, known: 0, sharedSlotFromPendingPods: 2147483647. 23/06/08 16:25:16 WARN WatchConnectionManager: Exec Failure: HTTP 400, Status: 400 - Bad Request 23/06/08 16:25:16 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has been closed. 23/06/08 16:25:16 ERROR SparkContext: Error initializing SparkContext. io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: GET at: https://kubernetes.default.svc/api/v1/namespaces/spark-apps/pods?labelSelector=spark-app-selector%3Dspark-0968f860f58f469cba38861033b463bf%2Cspark-role%3Dexecutor%2Cspark-driver-pod-name%3Dkyuubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver&allowWatchBookmarks=true&watch=true. Message: Bad Request. at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.lambda$run$2(WatchConnectionManager.java:126) ~[kubernetes-client-5.12.2.jar:?] at java.util.concurrent.CompletableFuture.uniHandle(CompletableFuture.java:836) ~[?:1.8.0_362] at java.util.concurrent.CompletableFuture$UniHandle.tryFire(CompletableFuture.java:811) ~[?:1.8.0_362] at java.util.concurrent.CompletableFuture.postComplete(CompletableFuture.java:488) ~[?:1.8.0_362] at java.util.concurrent.CompletableFuture.completeExceptionally(CompletableFuture.java:1990) ~[?:1.8.0_362] at io.fabric8.kubernetes.client.okhttp.OkHttpWebSocketImpl$BuilderImpl$1.onFailure(OkHttpWebSocketImpl.java:66) ~[kubernetes-client-5.12.2.jar:?] at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:571) ~[okhttp-3.12.12.jar:?] at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:198) ~[okhttp-3.12.12.jar:?] at okhttp3.RealCall$AsyncCall.execute(RealCall.java:203) ~[okhttp-3.12.12.jar:?] at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) ~[okhttp-3.12.12.jar:?] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] Suppressed: java.lang.Throwable: waiting here at io.fabric8.kubernetes.client.utils.Utils.waitUntilReady(Utils.java:169) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.utils.Utils.waitUntilReadyOrFail(Utils.java:180) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager.waitUntilReady(WatchConnectionManager.java:96) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:572) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:547) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.watch(BaseOperation.java:83) ~[kubernetes-client-5.12.2.jar:?] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsWatchSnapshotSource.start(ExecutorPodsWatchSnapshotSource.scala:64) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.cluster.k8s.KubernetesClusterSchedulerBackend.start(KubernetesClusterSchedulerBackend.scala:154) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.TaskSchedulerImpl.start(TaskSchedulerImpl.scala:222) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.SparkContext.<init>(SparkContext.scala:586) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.SparkContext$.getOrCreate(SparkContext.scala:2708) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.sql.SparkSession$Builder.$anonfun$getOrCreate$2(SparkSession.scala:953) ~[spark-sql_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at scala.Option.getOrElse(Option.scala:189) ~[scala-library-2.12.15.jar:?] at org.apache.spark.sql.SparkSession$Builder.getOrCreate(SparkSession.scala:947) ~[spark-sql_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.kyuubi.engine.spark.SparkSQLEngine$.createSpark(SparkSQLEngine.scala:253) ~[kyuubi-spark-sql-engine_2.12-1.7.1.jar:?] at org.apache.kyuubi.engine.spark.SparkSQLEngine$.main(SparkSQLEngine.scala:326) ~[kyuubi-spark-sql-engine_2.12-1.7.1.jar:?] at org.apache.kyuubi.engine.spark.SparkSQLEngine.main(SparkSQLEngine.scala) ~[kyuubi-spark-sql-engine_2.12-1.7.1.jar:?] at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_362] at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_362] at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_362] at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_362] at org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:1006) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:165) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at java.security.AccessController.doPrivileged(Native Method) ~[?:1.8.0_362] at javax.security.auth.Subject.doAs(Subject.java:422) ~[?:1.8.0_362] at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1878) ~[hadoop-client-api-3.3.3-amzn-2.jar:?] at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:163) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1095) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1104) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) ~[spark-core_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] 23/06/08 16:25:16 INFO SparkUI: Stopped Spark web UI at http://spark-187df6889bd35db6-driver-svc.spark-apps.svc:4040 23/06/08 16:25:16 INFO KubernetesClusterSchedulerBackend: Shutting down all executors 23/06/08 16:25:16 INFO KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each executor to shut down 23/06/08 16:25:16 INFO KubernetesClientUtils: Spark configuration files loaded from Some(/usr/lib/spark/conf) : spark-env.sh,hive-site.xml,log4j2.properties,metrics.properties 23/06/08 16:25:16 INFO BasicExecutorFeatureStep: Decommissioning not enabled, skipping shutdown script 23/06/08 16:25:17 WARN ExecutorPodsSnapshotsStoreImpl: Exception when notifying snapshot subscriber. io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: POST at: https://kubernetes.default.svc/api/v1/namespaces/spark-apps/pods. Message: Pod "kyuubi-0a1b9d58-85e7-416c-aacb-c374e2e1b6b3-exec-1" is invalid: metadata.labels: Invalid value: "kyuubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver": must be no more than 63 characters. Received status: Status(apiVersion=v1, code=422, details=StatusDetails(causes=[StatusCause(field=metadata.labels, message=Invalid value: "kyuubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver": must be no more than 63 characters, reason=FieldValueInvalid, additionalProperties={})], group=null, kind=Pod, name=kyuubi-0a1b9d58-85e7-416c-aacb-c374e2e1b6b3-exec-1, retryAfterSeconds=null, uid=null, additionalProperties={}), kind=Status, message=Pod "kyuubi-0a1b9d58-85e7-416c-aacb-c374e2e1b6b3-exec-1" is invalid: metadata.labels: Invalid value: "kyu ubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver": must be no more than 63 characters, metadata=ListMeta(_continue=null, remainingItemCount=null, resourceVersion=null, selfLink=null, additionalProperties={}), reason=Invalid, status=Failure, additionalProperties={}). at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:682) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.requestFailure(OperationSupport.java:661) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.assertResponseCode(OperationSupport.java:612) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:555) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleResponse(OperationSupport.java:518) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.OperationSupport.handleCreate(OperationSupport.java:305) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:644) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.BaseOperation.handleCreate(BaseOperation.java:83) ~[kubernetes-client-5.12.2.jar:?] at io.fabric8.kubernetes.client.dsl.base.CreateOnlyResourceOperation.create(CreateOnlyResourceOperation.java:61) ~[kubernetes-client-5.12.2.jar:?] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$requestNewExecutors$1(ExecutorPodsAllocator.scala:430) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:158) ~[scala-library-2.12.15.jar:?] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.requestNewExecutors(ExecutorPodsAllocator.scala:412) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$37(ExecutorPodsAllocator.scala:376) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$onNewSnapshots$37$adapted(ExecutorPodsAllocator.scala:369) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62) ~[scala-library-2.12.15.jar:?] at scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55) ~[scala-library-2.12.15.jar:?] at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49) ~[scala-library-2.12.15.jar:?] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.onNewSnapshots(ExecutorPodsAllocator.scala:369) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3(ExecutorPodsAllocator.scala:143) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsAllocator.$anonfun$start$3$adapted(ExecutorPodsAllocator.scala:143) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.org$apache$spark$scheduler$cluster$k8s$ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber$$processSnapshotsInternal(ExecutorPodsSnapshotsStoreImpl.scala:138) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl$SnapshotsSubscriber.processSnapshots(ExecutorPodsSnapshotsStoreImpl.scala:126) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at org.apache.spark.scheduler.cluster.k8s.ExecutorPodsSnapshotsStoreImpl.$anonfun$addSubscriber$1(ExecutorPodsSnapshotsStoreImpl.scala:81) ~[spark-kubernetes_2.12-3.3.1-amzn-0.jar:3.3.1-amzn-0] at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_362] at java.util.concurrent.FutureTask.runAndReset(FutureTask.java:308) ~[?:1.8.0_362] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:180) ~[?:1.8.0_362] at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:294) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) ~[?:1.8.0_362] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) ~[?:1.8.0_362] at java.lang.Thread.run(Thread.java:750) ~[?:1.8.0_362] ``` ### Kyuubi Server Configurations ```yaml 23/06/08 16:25:13 INFO SparkContext: Spark configuration: spark.app.id=spark-0968f860f58f469cba38861033b463bf spark.app.name=kyuubi_USER_SPARK_SQL_khwunchai_default_73bce6a4-df00-403e-bc5d-d1721e515f9d spark.app.startTime=1686241513713 spark.app.submitTime=1686241513129 spark.authenticate=true spark.blacklist.decommissioning.enabled=true spark.blacklist.decommissioning.timeout=1h spark.databricks.delta.schema.autoMerge.enabled=true spark.decommissioning.timeout.threshold=20 spark.default.parallelism=8 spark.driver.bindAddress=10.177.40.182 spark.driver.blockManager.port=7079 spark.driver.cores=1 spark.driver.defaultJavaOptions=-XX:OnOutOfMemoryError='kill -9 %p' -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70 spark.driver.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-datacata log-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/share/aws/redshift/jdbc/RedshiftJDBC.jar:/usr/share/aws/redshift/spark-redshift/lib/* spark.driver.extraJavaOptions=-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -XX:OnOutOfMemoryError='kill -9 %p' -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70 spark.driver.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native spark.driver.host=spark-187df6889bd35db6-driver-svc.spark-apps.svc spark.driver.memory=3600M spark.driver.port=7078 spark.dynamicAllocation.cachedExecutorIdleTimeout=300s spark.dynamicAllocation.enabled=true spark.dynamicAllocation.executorAllocationRatio=0.33 spark.dynamicAllocation.initialExecutors=1 spark.dynamicAllocation.maxExecutors=2 spark.dynamicAllocation.shuffleTracking.enabled=true spark.eventLog.dir=s3://omise-data-platform-apps-staging/spark/logs spark.eventLog.enabled=true spark.executor.cores=1 spark.executor.defaultJavaOptions=-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70 -XX:OnOutOfMemoryError='kill -9 %p' spark.executor.extraClassPath=/usr/lib/hadoop-lzo/lib/*:/usr/lib/hadoop/hadoop-aws.jar:/usr/share/aws/aws-java-sdk/*:/usr/share/aws/emr/emrfs/conf:/usr/share/aws/emr/emrfs/lib/*:/usr/share/aws/emr/emrfs/auxlib/*:/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/usr/share/aws/emr/security/conf:/usr/share/aws/emr/security/lib/*:/usr/share/aws/hmclient/lib/aws-glue-datacatalog-spark-client.jar:/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/docker/usr/lib/hadoop-lzo/lib/*:/docker/usr/lib/hadoop/hadoop-aws.jar:/docker/usr/share/aws/aws-java-sdk/*:/docker/usr/share/aws/emr/emrfs/conf:/docker/usr/share/aws/emr/emrfs/lib/*:/docker/usr/share/aws/emr/emrfs/auxlib/*:/docker/usr/share/aws/emr/goodies/lib/emr-spark-goodies.jar:/docker/usr/share/aws/emr/security/conf:/docker/usr/share/aws/emr/security/lib/*:/docker/usr/share/aws/hmclient/lib/aws-glue-dataca talog-spark-client.jar:/docker/usr/share/java/Hive-JSON-Serde/hive-openx-serde.jar:/docker/usr/share/aws/sagemaker-spark-sdk/lib/sagemaker-spark-sdk.jar:/docker/usr/share/aws/emr/s3select/lib/emr-s3-select-spark-connector.jar:/usr/share/aws/redshift/jdbc/RedshiftJDBC.jar:/usr/share/aws/redshift/spark-redshift/lib/* spark.executor.extraJavaOptions=-XX:+IgnoreUnrecognizedVMOptions --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.lang.invoke=ALL-UNNAMED --add-opens=java.base/java.lang.reflect=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED --add-opens=java.base/java.net=ALL-UNNAMED --add-opens=java.base/java.nio=ALL-UNNAMED --add-opens=java.base/java.util=ALL-UNNAMED --add-opens=java.base/java.util.concurrent=ALL-UNNAMED --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED --add-opens=java.base/sun.nio.ch=ALL-UNNAMED --add-opens=java.base/sun.nio.cs=ALL-UNNAMED --add-opens=java.base/sun.security.action=ALL-UNNAMED --add-opens=java.base/sun.util.calendar=ALL-UNNAMED --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+UseParallelGC -XX:InitiatingHeapOccupancyPercent=70 -XX:OnOutOfMemoryError='kill -9 %p' spark.executor.extraLibraryPath=/usr/lib/hadoop/lib/native:/usr/lib/hadoop-lzo/lib/native:/docker/usr/lib/hadoop/lib/native:/docker/usr/lib/hadoop-lzo/lib/native spark.executor.memory=7200M spark.executorEnv.SPARK_USER_NAME=khwunchai spark.files.fetchFailure.unRegisterOutputOnHost=true spark.hadoop.dynamodb.customAWSCredentialsProvider=*********(redacted) spark.hadoop.fs.defaultFS=file:/// spark.hadoop.fs.s3.customAWSCredentialsProvider=*********(redacted) spark.hadoop.fs.s3.getObject.initialSocketTimeoutMilliseconds=2000 spark.hadoop.hive.metastore.client.factory.class=com.amazonaws.glue.catalog.metastore.AWSGlueDataCatalogHiveClientFactory spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version.emr_internal_use_only.EmrFileSystem=2 spark.hadoop.mapreduce.fileoutputcommitter.cleanup-failures.ignored.emr_internal_use_only.EmrFileSystem=true spark.hadoop.mapreduce.input.fileinputformat.list-status.num-threads=20 spark.history.fs.logDirectory=file:///var/log/spark/apps spark.history.ui.port=18080 spark.hive.server2.thrift.resultset.default.fetch.size=1000 spark.jars=file:/tmp/spark-b3f39e11-1a74-40f7-a84b-273d5a2ad361/kyuubi-spark-sql-engine_2.12-1.7.1.jar,local:///usr/share/aws/delta/lib/delta-core.jar,local:///usr/share/aws/delta/lib/delta-storage.jar spark.kryoserializer.buffer.max=256 spark.kubernetes.authenticate.driver.serviceAccountName=kyuubi-sparksql-engine spark.kubernetes.authenticate.executor.serviceAccountName=kyuubi-sparksql-engine spark.kubernetes.container.image=671219180197.dkr.ecr.ap-southeast-1.amazonaws.com/spark/emr-6.10.0:20230421 spark.kubernetes.container.image.pullPolicy=Always spark.kubernetes.driver.label.kyuubi-unique-tag=73bce6a4-df00-403e-bc5d-d1721e515f9d spark.kubernetes.driver.pod.name=kyuubi-user-spark-sql-khwunchai-default-73bce6a4-df00-403e-bc5d-d1721e515f9d-f0ccb3889bd3576e-driver spark.kubernetes.driver.podTemplateContainerName=spark-kubernetes-driver spark.kubernetes.driver.podTemplateFile=/opt/kyuubi/conf/driver-template.yaml spark.kubernetes.driver.request.cores=250m spark.kubernetes.driverEnv.SPARK_USER_NAME=khwunchai spark.kubernetes.executor.podNamePrefix=kyuubi-0a1b9d58-85e7-416c-aacb-c374e2e1b6b3 spark.kubernetes.executor.podTemplateContainerName=spark-kubernetes-executor spark.kubernetes.executor.podTemplateFile=/opt/spark/pod-template/pod-spec-template.yml spark.kubernetes.executor.request.cores=500m spark.kubernetes.file.upload.path=s3://omise-data-platform-apps-staging/spark/uploads/ spark.kubernetes.memoryOverheadFactor=0.1 spark.kubernetes.namespace=spark-apps spark.kubernetes.pyspark.pythonVersion=3 spark.kubernetes.resource.type=java spark.kubernetes.submitInDriver=true spark.kyuubi.client.ipAddress=192.168.1.101 spark.kyuubi.client.version=1.7.0 spark.kyuubi.credentials.hadoopfs.enabled=false spark.kyuubi.credentials.hive.enabled=false spark.kyuubi.engine.credentials= spark.kyuubi.engine.share.level=USER spark.kyuubi.engine.submit.time=1686241495689 spark.kyuubi.engine.type=SPARK_SQL spark.kyuubi.frontend.connection.url.use.hostname=false spark.kyuubi.frontend.protocols=THRIFT_BINARY,REST spark.kyuubi.ha.addresses=zookeeper-headless.spark.svc.cluster.local spark.kyuubi.ha.client.class=org.apache.kyuubi.ha.client.zookeeper.ZookeeperDiscoveryClient spark.kyuubi.ha.enabled=true spark.kyuubi.ha.engine.ref.id=73bce6a4-df00-403e-bc5d-d1721e515f9d spark.kyuubi.ha.namespace=/kyuubi_1.7.1_USER_SPARK_SQL/khwunchai/default spark.kyuubi.ha.zookeeper.auth.type=NONE spark.kyuubi.ha.zookeeper.client.port=2181 spark.kyuubi.ha.zookeeper.engine.auth.type=NONE spark.kyuubi.ha.zookeeper.session.timeout=600000 spark.kyuubi.server.ipAddress=0.0.0.0 spark.kyuubi.session.connection.url=0.0.0.0:10009 spark.kyuubi.session.engine.idle.timeout=PT20M spark.kyuubi.session.engine.initialize.timeout=120000 spark.kyuubi.session.real.user=khwunchai spark.logConf=true spark.master=k8s://https://kubernetes.default.svc:443 spark.redaction.regex=*********(redacted) spark.repl.class.outputDir=/var/data/spark-fce5fc27-0a38-451f-b83f-e3712babead1/spark-163261a5-b242-4996-aa67-65212e84d128/repl-85bea369-fd1b-414f-93d2-357c947d6e52 spark.repl.local.jars=file:/tmp/spark-b3f39e11-1a74-40f7-a84b-273d5a2ad361/kyuubi-spark-sql-engine_2.12-1.7.1.jar,local:///usr/share/aws/delta/lib/delta-core.jar,local:///usr/share/aws/delta/lib/delta-storage.jar spark.resourceManager.cleanupExpiredHost=true spark.scheduler.mode=FAIR spark.serializer=org.apache.spark.serializer.KryoSerializer spark.shuffle.service.enabled=false spark.sql.adaptive.enabled=true spark.sql.catalog.spark_catalog=org.apache.spark.sql.delta.catalog.DeltaCatalog spark.sql.catalogImplementation=hive spark.sql.emr.internal.extensions=com.amazonaws.emr.spark.EmrSparkSessionExtensions spark.sql.execution.topKSortFallbackThreshold=10000 spark.sql.extensions=io.delta.sql.DeltaSparkSessionExtension spark.sql.legacy.castComplexTypesToString.enabled=true spark.sql.parquet.datetimeRebaseModeInRead=CORRECTED spark.sql.parquet.datetimeRebaseModeInWrite=CORRECTED spark.sql.parquet.fs.optimized.committer.optimization-enabled=true spark.sql.parquet.int96RebaseModeInRead=CORRECTED spark.sql.parquet.int96RebaseModeInWrite=CORRECTED spark.sql.parquet.output.committer.class=com.amazon.emr.committer.EmrOptimizedSparkSqlParquetOutputCommitter spark.sql.sources.partitionColumnTypeInference.enabled=false spark.stage.attempt.ignoreOnDecommissionFetchFailure=true spark.submit.deployMode=client spark.submit.pyFiles= spark.ui.enabled=true spark.ui.port=4040 spark.yarn.heterogeneousExecutors.enabled=false ``` ### Kyuubi Engine Configurations _No response_ ### Additional context Spark version 3.3.1-amzn-0 (EMR Containers) ### Are you willing to submit PR? - [X] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix. - [ ] No. I cannot submit a PR at this time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
