JoonPark1 opened a new issue, #7226: URL: https://github.com/apache/kyuubi/issues/7226
### Code of Conduct - [x] I agree to follow this project's [Code of Conduct](https://www.apache.org/foundation/policies/conduct) ### Search before asking - [x] I have searched in the [issues](https://github.com/apache/kyuubi/issues?q=is%3Aissue) and found no similar issues. ### Describe the bug This occurs for Kubernetes Cluster deployment mode for kyuubi over k8 cluster. When multiple large number of spark batch jobs are submitted to kyuubi, the kyuubi attempts to spin up large number of spark drivers to handle per batch job. However, when there is heavy load, it can cause kyuubi to store records via MetadataManager about the state of each batch job as "PENDING" and repeated polling about each batch job's status until it runs out of memory. Then, upon next restart of kyuubi pod container, it again repeatedly polls as the records never gets updated as all the spark drivers to handle batch jobs never gets created and scheduled in first place. This causes the records in Metadata Store for kyuubi to persist regarding batch jobs as "state" field of value "PENDING" and "engine_state" field of value "UNKNOWN". The records can never get resolved and the repeated polling continues causing subsequent restarts of kyuubi to run out of memory. ### Affects Version(s) v1.10.2 ### Kyuubi Server Log Output ```logtalk 2025-10-21 16:03:48.837 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=406b8bc1-457e-4115-ae55-50a0d39c061c to be created, elapsed time: 92106ms, return UNKNOWN status 2025-10-21 16:03:48.929 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=6d439666-0484-4902-9dab-39ad39f96b3e to be created, elapsed time: 92291ms, return UNKNOWN status 2025-10-21 16:03:48.929 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=fc5979a0-5599-4088-90ee-c5e995e0fca7 to be created, elapsed time: 92365ms, return UNKNOWN status 2025-10-21 16:03:48.930 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=6aacf45b-21b3-44a5-bdd1-1f05eaaec393 to be created, elapsed time: 92365ms, return UNKNOWN status 2025-10-21 16:03:48.930 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=2db1998a-e5e2-4a31-bc97-40f5f1b31345 to be created, elapsed time: 92258ms, return UNKNOWN status 2025-10-21 16:03:48.931 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=20a84034-3c6b-47b0-916b-199b0e0750da to be created, elapsed time: 92250ms, return UNKNOWN status 2025-10-21 16:03:48.932 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=3aa6e64e-a4af-443e-97c3-4296befd050a to be created, elapsed time: 92202ms, return UNKNOWN status 2025-10-21 16:03:48.937 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=d259e70a-ef06-41f3-8e2e-fe661552fae1 to be created, elapsed time: 92199ms, return UNKNOWN status 2025-10-21 16:03:48.938 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=f4e45523-f18a-4f69-b163-07b96666cee0 to be created, elapsed time: 92271ms, return UNKNOWN status 2025-10-21 16:03:49.135 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=171cacb5-4b47-4db4-a384-6f2925f927e6 to be created, elapsed time: 92505ms, return UNKNOWN status 2025-10-21 16:03:50.638 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=d98b10c6-e56a-429a-b86c-79f353ac18bb to be created, elapsed time: 94073ms, return UNKNOWN status 2025-10-21 16:03:50.640 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=a1e6f809-c1b7-41a7-b21f-007eaa2eaaf0 to be created, elapsed time: 94076ms, return UNKNOWN status 2025-10-21 16:03:50.729 WARN org.apache.kyuubi.engine.KubernetesApplicationOperation: Waiting for driver pod with label: kyuubi-unique-tag=0d71979b-3f34-485f-8088-9bada0308133 to be created, elapsed time: 94086ms, return UNKNOWN status ``` ### Kyuubi Engine Log Output ```logtalk No engine log output... it seems the k8 pod crashes for kyuubi server before it even have chance to write the engine logs specific to each batch-job submission for my given user... ``` ### Kyuubi Server Configurations ```yaml ################################################## kyuubi server settings ############################################# kyuubi.kubernetes.master.address=https://azure-eastus2-st-085-dv-dl-001-9ejkzfkk.hcp.eastus2.azmk8s.io:443 kyuubi.kubernetes.namespace=kyuubi-poc kyuubi.kubernetes.authenticate.driver.serviceAccountName=kyuubi-poc kyuubi.kubernetes.trust.certificates=true # defaults to POD we ran into edge case where imagepullbackoff and job is pending kyuubi.kubernetes.application.state.source=CONTAINER kyuubi.kubernetes.authenticate.oauthTokenFile=/var/run/secrets/kubernetes.io/serviceaccount/token kyuubi.engine.kubernetes.submit.timeout=PT300S # enable arrow configuration kyuubi.operation.result.format=arrow # kyuubi.operation.incremental.collect=true ################################################## kyuubi engine settings ############################################# kyuubi.engine.share.level=USER # kyuubi.engine.share.level=SERVER kyuubi.server.name=superset-poc-server ################################################## Very expiremental stuff ############################################ # kyuubi.engine.deregister.exception.messages=Error getting policies,serviceName=spark,httpStatusCode:400 # kyuubi.engine.deregister.job.max.failures=1 # kyuubi.engine.deregister.exception.ttl=PT10M ################################################## kyuubi admin list ################################################## kyuubi.server.administrators=A242528,A295378,A270054 ################################################## kyuubi profile settings ############################################ kyuubi.session.conf.advisor=org.apache.kyuubi.session.FileSessionConfAdvisor ##################################kyuubi engine kill disable settings ################################################# # kyuubi.engine.ui.stop.enabled=false ################################################## kyuubi engine clean up settings #################################### kyuubi.kubernetes.spark.cleanupTerminatedDriverPod.kind=COMPLETED kyuubi.kubernetes.terminatedApplicationRetainPeriod=PT5M ################################################## User specific defaults ############################################# # ___srv-spark-dbt-np___.kyuubi.session.engine.idle.timeout=PT30S # ___srv-spark-dbt-np___.kyuubi.session.idle.timeout=PT30S ___srv-spark-dbt-np___.kyuubi.session.engine.initialize.timeout=PT10M kyuubi.session.idle.timeout=PT15M kyuubi.batch.session.idle.timeout=PT15M kyuubi.engine.user.isolated.spark.session.idle.timeout=PT15M ################################################## Trino Engine ####################################################### kyuubi.frontend.protocols=REST,THRIFT_BINARY,TRINO kyuubi.frontend.trino.bind.host=0.0.0.0 kyuubi.frontend.trino.bind.port=10011 ################################################## kyuubi ldap auth ################################################### kyuubi.authentication=LDAP # kyuubi.authentication.ldap.url=ldaps://GEICO-LDAPS-FR-IH.geico.corp.net:636 kyuubi.authentication.ldap.url=ldaps://GEICO-LDAPS-FR-IH.geico.corp.net:636 ldaps://GEICO-LDAPS-PL-IH.geico.corp.net:636 ldaps://GEICO-LDAPS-PD-WL.geico.corp.net:636 kyuubi.authentication.ldap.binddn=CN=SRV-DESPT-RGR-LDP-NP,OU=Service Accounts,OU=Admin,DC=GEICO,DC=corp,DC=net kyuubi.authentication.ldap.bindpw=_SYNC_LDAP_BIND_PASSWORD_ kyuubi.authentication.ldap.baseDN=OU=Admin,DC=GEICO,DC=corp,DC=net kyuubi.authentication.ldap.userDNPattern=sAMAccountName=%s,OU=Admin,DC=GEICO,DC=corp,DC=net kyuubi.authentication.ldap.userMembershipKey=memberOf kyuubi.authentication.ldap.groupDNPattern=CN=%s,OU=Admin,DC=GEICO,DC=corp,DC=net kyuubi.authentication.ldap.guidKey=sAMAccountName kyuubi.authentication.ldap.groupClassKey=group kyuubi.authentication.ldap.groupFilter=ENT-ASG-DATALAKEHOUSE-COMPUTE-PLATFORM-NP-USER,ENT-SBR-DATALAKEHOUSE_CONTRIBUTOR-NP-ASSIGNED,ENT-ASG-ADB-EDPCOR-SB-DED-CONTRIBUTOR,ENT-ASG-AZURE-DATAOPS-PLATFORM-NP-ADMIN ################################################## kyuubi enable UI ################################################## kyuubi.frontend.rest.bind.host=0.0.0.0 ################################################## kyuubi session settings ########################################### # kyuubi.session.conf.restrict.list=spark.sql.optimizer.excludedRules,spark.kubernetes.driver.node.selector.label,spark.kubernetes.executor.node.selector.label,spark.master,spark.submit.deployMode,spark.kubernetes.namespace,spark.kubernetes.authenticate.driver.serviceAccountName,spark.kubernetes.driver.podTemplateFile,spark.kubernetes.executor.podTemplateFile,spark.ui.killEnabled,spark.redaction.regex,spark.sql.redaction.string.regex spark.kyuubi.conf.restricted.list=spark.sql.optimizer.excludedRules,spark.kubernetes.driver.node.selector.label,spark.kubernetes.executor.node.selector.label,spark.master,spark.submit.deployMode,spark.kubernetes.namespace,spark.kubernetes.authenticate.driver.serviceAccountName,spark.kubernetes.driver.podTemplateFile,spark.kubernetes.executor.podTemplateFile,spark.ui.killEnabled,spark.redaction.regex,spark.sql.redaction.string.regex kyuubi.session.conf.ignore.list=spark.sql.optimizer.excludedRules,spark.kubernetes.driver.node.selector.label,spark.kubernetes.executor.node.selector.label,spark.master,spark.submit.deployMode,spark.kubernetes.namespace,spark.kubernetes.authenticate.driver.serviceAccountName,spark.kubernetes.driver.podTemplateFile,spark.kubernetes.executor.podTemplateFile,spark.ui.killEnabled,spark.redaction.regex,spark.sql.redaction.string.regex kyuubi.batch.conf.ignore.list=spark.kubernetes.driver.node.selector.label,spark.kubernetes.executor.node.selector.label,spark.master,spark.submit.deployMode,spark.kubernetes.namespace,spark.kubernetes.authenticate.driver.serviceAccountName,spark.kubernetes.driver.podTemplateFile,spark.kubernetes.executor.podTemplateFile,spark.ui.killEnabled,spark.redaction.regex,spark.sql.redaction.string.regex ################################################## kyuubi zookeeper settings ######################################### kyuubi.ha.addresses=http://etcd:2379 kyuubi.ha.client.class=org.apache.kyuubi.ha.client.etcd.EtcdDiscoveryClient kyuubi.ha.namespace=kyuubi ################################################## Database configurations for metadata store ######################## kyuubi.metadata.store.jdbc.database.type=POSTGRESQL kyuubi.metadata.store.jdbc.driver=org.postgresql.Driver kyuubi.metadata.store.jdbc.url=jdbc:postgresql://kyuubipoc.datalakehouse.dv.prw.cloud.geico.net:5432/kyuubidb?tcpKeepAlive=true&logUnclosedConnections=true&prepareThreshold=0 kyuubi.metadata.store.jdbc.user=srv-kyuubi-user-dv kyuubi.metadata.store.jdbc.password=_KYUUBI_DB_PWD_ kyuubi.metadata.store.jdbc.datasource.maxLifetime=180000 # README!!!!! max 200 connections, so take 200 / pod count for the max pool size. this value can be less than the max too kyuubi.metadata.store.jdbc.datasource.maximumPoolSize=20 kyuubi.metadata.store.jdbc.datasource.connectionTimeout=30000 kyuubi.metadata.store.jdbc.datasource.leakDetectionThreshold=150000 ################################################## Batch Defaults ##################################################### kyuubi.batchConf.spark.spark.master=k8s://azure-eastus2-st-085-dv-dl-001-9ejkzfkk.hcp.eastus2.azmk8s.io:443 kyuubi.batchConf.spark.spark.kubernetes.namespace=kyuubi-poc kyuubi.batchConf.spark.spark.kubernetes.authenticate.driver.serviceAccountName=kyuubi-poc kyuubi.batchConf.spark.spark.hadoop.fs.AbstractFileSystem.abfss.impl=org.apache.hadoop.fs.azurebfs.Abfss # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto002.dfs.core.windows.net=OAuth # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto002.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto002.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto002.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto002.dfs.core.windows.net=_HADOOP_ADLS_SECRET_ kyuubi.batchConf.spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto002.dfs.core.windows.net=SharedKey kyuubi.batchConf.spark.hadoop.fs.azure.account.key.gzfedpcordv1sto002.dfs.core.windows.net=_GZFEDPCORDV1STO002_ADLS_KEY_ kyuubi.batchConf.spark.hadoop.fs.azure.account.auth.type.gzfdlhdrsdv1sto001.dfs.core.windows.net=SharedKey kyuubi.batchConf.spark.hadoop.fs.azure.account.key.gzfdlhdrsdv1sto001.dfs.core.windows.net=_GZFDLHDRSDV1STO001_ADLS_KEY_ kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfdlhingdv1sto001.dfs.core.windows.net=OAuth kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfdlhingdv1sto001.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfdlhingdv1sto001.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfdlhingdv1sto001.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48 kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfdlhingdv1sto001.dfs.core.windows.net=_DLH_ADLS_SECRET_ kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto003.dfs.core.windows.net=OAuth kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto003.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto003.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto003.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto003.dfs.core.windows.net=_HADOOP_ADLS_SECRET_ kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto003.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48 kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto003.dfs.core.windows.net=_DLH_ADLS_SECRET_ kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto004.dfs.core.windows.net=OAuth kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto004.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto004.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto004.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto004.dfs.core.windows.net=_HADOOP_ADLS_SECRET_ kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto004.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48 kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto004.dfs.core.windows.net=_DLH_ADLS_SECRET_ kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.auth.type.gzfhststgdv1sto001.dfs.core.windows.net=OAuth kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth.provider.type.gzfhststgdv1sto001.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfhststgdv1sto001.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfhststgdv1sto001.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e # kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfhststgdv1sto001.dfs.core.windows.net=_HADOOP_ADLS_SECRET_ kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.id.gzfhststgdv1sto001.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48 kyuubi.batchConf.spark.spark.hadoop.fs.azure.account.oauth2.client.secret.gzfhststgdv1sto001.dfs.core.windows.net=_DLH_ADLS_SECRET_ kyuubi.batchConf.spark.spark.eventLog.enabled=true kyuubi.batchConf.spark.spark.eventLog.compress=true kyuubi.batchConf.spark.spark.eventLog.compression.codec=zstd kyuubi.batchConf.spark.spark.hadoop.fs.azure.write.request.size=33554432 kyuubi.batchConf.spark.spark.eventLog.dir=abfss://[email protected]/eventlogs #kyuubi.batchConf.spark.spark.hadoop.hive.metastore.client.connect.retry.delay=5 #kyuubi.batchConf.spark.spark.hadoop.hive.metastore.client.socket.timeout=1800 #kyuubi.batchConf.spark.spark.hadoop.hive.metastore.uris=thrift://10.29.27.118:443 kyuubi.batchConf.spark.spark.hadoop.hive.server2.thrift.http.port=10002 kyuubi.batchConf.spark.spark.hadoop.hive.server2.thrift.port=10000 kyuubi.batchConf.spark.spark.hadoop.hive.server2.transport.mode=binary #kyuubi.batchConf.spark.spark.hadoop.metastore.catalog.default=hive kyuubi.batchConf.spark.spark.hadoop.hive.execution.engine=spark kyuubi.batchConf.spark.spark.hadoop.hive.input.format=io.delta.hive.HiveInputFormat kyuubi.batchConf.spark.spark.hadoop.hive.tez.input.format=io.delta.hive.HiveInputFormat kyuubi.frontend.rest.proxy.jetty.client.responseBufferSize=16384 # kyuubi.batchConf.spark.spark.redaction.regex="(?i)secret|password|passwd|token|\.account\.key|credential|credentials|\.client\.secret\|_secret|appMgrInfo|pwd" kyuubi.server.redaction.regex='(?i)(secret|password|passwd|token|\.account\.key|credential|credentials|\.client\.secret\|_secret|pwd|appMgrInfo)' kyuubi.batchConf.spark.sql.redaction.string.regex=(?i)\bselect\b[\s\S]+?\bfrom\b[\s\S]+?(;|$) ######################################################### Optimizations ################################################ kyuubi.batchConf.spark.spark.sql.adaptive.enabled=true kyuubi.batchConf.spark.spark.sql.adaptive.forceApply=false kyuubi.batchConf.spark.spark.sql.adaptive.logLevel=info kyuubi.batchConf.spark.spark.sql.adaptive.advisoryPartitionSizeInBytes=128m kyuubi.batchConf.spark.spark.sql.adaptive.coalescePartitions.enabled=true kyuubi.batchConf.spark.spark.sql.adaptive.coalescePartitions.minPartitionNum=1 kyuubi.batchConf.spark.spark.sql.adaptive.coalescePartitions.initialPartitionNum=1024 kyuubi.batchConf.spark.spark.sql.adaptive.fetchShuffleBlocksInBatch=true kyuubi.batchConf.spark.spark.sql.adaptive.localShuffleReader.enabled=true kyuubi.batchConf.spark.spark.sql.adaptive.skewJoin.enabled=true kyuubi.batchConf.spark.spark.sql.adaptive.skewJoin.skewedPartitionFactor=5 kyuubi.batchConf.spark.spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=400m kyuubi.batchConf.spark.spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0.2 # DRA (shuffle tracking) defaults for batch engines kyuubi.batchConf.spark.spark.dynamicAllocation.enabled=true kyuubi.batchConf.spark.spark.dynamicAllocation.shuffleTracking.enabled=true kyuubi.batchConf.spark.spark.dynamicAllocation.initialExecutors=2 kyuubi.batchConf.spark.spark.dynamicAllocation.minExecutors=2 kyuubi.batchConf.spark.spark.dynamicAllocation.maxExecutors=64 kyuubi.batchConf.spark.spark.dynamicAllocation.executorAllocationRatio=0.5 kyuubi.batchConf.spark.spark.dynamicAllocation.executorIdleTimeout=60s kyuubi.batchConf.spark.spark.dynamicAllocation.cachedExecutorIdleTimeout=30min kyuubi.batchConf.spark.spark.cleaner.periodicGC.interval=5min kyuubi.batchConf.spark.spark.sql.autoBroadcastJoinThreshold=-1 kyuubi.operation.getTables.ignoreTableProperties=true # spark executor ADLS Variables # kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_CLIENT_ID=1d680742-02be-4b8c-969f-afafeccdcc0e kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_CLIENT_ID=dba6925b-465b-436b-b99c-f1b963988e48 kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_TENANT_ID=7389d8c0-3607-465c-a69f-7d4426502912 # kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_CLIENT_SECRET=_HADOOP_ADLS_SECRET_ kyuubi.batchConf.spark.spark.kubernetes.driverEnv.AZURE_CLIENT_SECRET=_DLH_ADLS_SECRET_ # kyuubi.batchConf.spark.spark.executorEnv.AZURE_CLIENT_ID=1d680742-02be-4b8c-969f-afafeccdcc0e kyuubi.batchConf.spark.spark.executorEnv.AZURE_CLIENT_ID=dba6925b-465b-436b-b99c-f1b963988e48 kyuubi.batchConf.spark.spark.executorEnv.AZURE_TENANT_ID=7389d8c0-3607-465c-a69f-7d4426502912 # kyuubi.batchConf.spark.spark.executorEnv.AZURE_CLIENT_SECRET=_HADOOP_ADLS_SECRET_ kyuubi.batchConf.spark.spark.executorEnv.AZURE_CLIENT_SECRET=_DLH_ADLS_SECRET_ # default resource configs kyuubi.batchConf.spark.spark.executor.memory=20G kyuubi.batchConf.spark.spark.executor.cores=6 kyuubi.batchConf.spark.spark.driver.memory=20G kyuubi.batchConf.spark.spark.driver.cores=6 ``` ### Kyuubi Engine Configurations ```yaml spark.master=k8s://azure-eastus2-st-085-dv-dl-001-9ejkzfkk.hcp.eastus2.azmk8s.io:443 spark.submit.deployMode=cluster spark.kubernetes.namespace=kyuubi-poc spark.kubernetes.authenticate.driver.serviceAccountName=kyuubi-poc #testing image with spark-hadoop-cloud dep spark.kubernetes.container.image=geiconp.azurecr.io/edposs/edpcor/spark/datalakehouse-spark-3.5.1:2025-10-09T17-23-02.1.9477612 #spark.kubernetes.container.image=geiconp.azurecr.io/edposs/edpcor/dlh-apache/spark-3.5.6-s2.12-j17-py3-dlh:2025-08-19T20-05-36.1.8886835 spark.hadoop.hive.server2.transport.mode=binary spark.hadoop.hive.execution.engine=spark spark.hadoop.hive.input.format=io.delta.hive.HiveInputFormat spark.hadoop.hive.tez.input.format=io.delta.hive.HiveInputFormat spark.sql.warehouse.dir=abfss://[email protected]/warehouse spark.hadoop.fs.defaultFS=abfss://[email protected] spark.hadoop.fs.AbstractFileSystem.abfss.impl=org.apache.hadoop.fs.azurebfs.Abfss # spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto002.dfs.core.windows.net=OAuth # spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto002.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider # spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto002.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token # spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto002.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e # spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto002.dfs.core.windows.net=_HADOOP_ADLS_SECRET_ spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto002.dfs.core.windows.net=SharedKey spark.hadoop.fs.azure.account.key.gzfedpcordv1sto002.dfs.core.windows.net=_GZFEDPCORDV1STO002_ADLS_KEY_ spark.hadoop.fs.azure.account.auth.type.gzfdlhdrsdv1sto001.dfs.core.windows.net=SharedKey spark.hadoop.fs.azure.account.key.gzfdlhdrsdv1sto001.dfs.core.windows.net=_GZFDLHDRSDV1STO001_ADLS_KEY_ spark.hadoop.fs.azure.account.auth.type.gzfdlhingdv1sto001.dfs.core.windows.net=OAuth spark.hadoop.fs.azure.account.oauth.provider.type.gzfdlhingdv1sto001.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfdlhingdv1sto001.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token spark.hadoop.fs.azure.account.oauth2.client.id.gzfdlhingdv1sto001.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48 spark.hadoop.fs.azure.account.oauth2.client.secret.gzfdlhingdv1sto001.dfs.core.windows.net=_DLH_ADLS_SECRET_ spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto003.dfs.core.windows.net=OAuth spark.eventLog.compress=true spark.eventLog.compression.codec=zstd spark.hadoop.fs.azure.write.request.size=33554432 spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto003.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto003.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token # spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto003.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e # spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto003.dfs.core.windows.net=_HADOOP_ADLS_SECRET_ spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto003.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48 spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto003.dfs.core.windows.net=_DLH_ADLS_SECRET_ spark.hadoop.fs.azure.account.auth.type.gzfedpcordv1sto004.dfs.core.windows.net=OAuth spark.hadoop.fs.azure.account.oauth.provider.type.gzfedpcordv1sto004.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfedpcordv1sto004.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token # spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto004.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e # spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto004.dfs.core.windows.net=_HADOOP_ADLS_SECRET_ spark.hadoop.fs.azure.account.oauth2.client.id.gzfedpcordv1sto004.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48 spark.hadoop.fs.azure.account.oauth2.client.secret.gzfedpcordv1sto004.dfs.core.windows.net=_DLH_ADLS_SECRET_ spark.hadoop.fs.azure.account.auth.type.gzfhststgdv1sto001.dfs.core.windows.net=OAuth spark.hadoop.fs.azure.account.oauth.provider.type.gzfhststgdv1sto001.dfs.core.windows.net=org.apache.hadoop.fs.azurebfs.oauth2.ClientCredsTokenProvider spark.hadoop.fs.azure.account.oauth2.client.endpoint.gzfhststgdv1sto001.dfs.core.windows.net=https://login.microsoftonline.com/7389d8c0-3607-465c-a69f-7d4426502912/oauth2/token # spark.hadoop.fs.azure.account.oauth2.client.id.gzfhststgdv1sto001.dfs.core.windows.net=1d680742-02be-4b8c-969f-afafeccdcc0e # spark.hadoop.fs.azure.account.oauth2.client.secret.gzfhststgdv1sto001.dfs.core.windows.net=_HADOOP_ADLS_SECRET_ spark.hadoop.fs.azure.account.oauth2.client.id.gzfhststgdv1sto001.dfs.core.windows.net=dba6925b-465b-436b-b99c-f1b963988e48 spark.hadoop.fs.azure.account.oauth2.client.secret.gzfhststgdv1sto001.dfs.core.windows.net=_DLH_ADLS_SECRET_ spark.kubernetes.file.upload.path=abfss://[email protected]/fileupload spark.executor.memory=16G spark.executor.cores=8 spark.driver.memory=200G spark.driver.cores=40 spark.driver.maxResultSize=20g spark.scheduler.mode=FAIR spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog spark.sql.adaptive.enabled=true spark.decommission.enabled=true spark.dynamicAllocation.enabled=true spark.dynamicAllocation.minExecutors=16 spark.dynamicAllocation.maxExecutors=64 spark.dynamicAllocation.executorAllocationRatio=0.5 spark.kubernetes.driver.node.selector.label=nodepool2 spark.kubernetes.executor.node.selector.label=nodepool3 spark.driver.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp -javaagent:/opt/spark/jars/jmx_prometheus_javaagent-1.0.1.jar=7778:/opt/spark/conf/config.yaml -Djava.security.manager=allow -Dio.netty.tryReflectionSetAccessible=true -XX:+UseG1GC spark.executor.extraJavaOptions=-Divy.cache.dir=/tmp -Divy.home=/tmp -javaagent:/opt/spark/jars/jmx_prometheus_javaagent-1.0.1.jar=7778:/opt/spark/conf/config.yaml -Djava.security.manager=allow -Dio.netty.tryReflectionSetAccessible=true -XX:+UseG1GC spark.kubernetes.executor.annotation.prometheus.io/port=7778 spark.kubernetes.executor.annotation.prometheus.io/scrape=true spark.kubernetes.executor.annotation.prometheus.io/path=/metrics spark.kubernetes.driver.annotation.prometheus.io/scrape=true spark.kubernetes.driver.annotation.prometheus.io/port=7778 spark.kubernetes.driver.annotation.prometheus.io/path=/metrics spark.kubernetes.executor.annotation.k8s.grafana.com/scrape=true spark.kubernetes.executor.annotation.k8s.grafana.com/metrics.path=/metrics spark.kubernetes.executor.annotation.k8s.grafana.com/metrics.portNumber=7778 spark.kubernetes.driver.annotation.k8s.grafana.com/scrape=true spark.kubernetes.driver.annotation.k8s.grafana.com/metrics.path=/metrics spark.kubernetes.driver.annotation.k8s.grafana.com/metrics.portNumber=7778 ## Gang Scheduling Configs # # spark.kubernetes.scheduler.name=yunikorn # spark.kubernetes.driver.label.queue=root.kyuubi-poc # spark.kubernetes.executor.label.queue=root.kyuubi-poc # spark.kubernetes.driver.annotation.yunikorn.apache.org/schedulingPolicyParameters="placeholderTimeoutInSeconds=30 gangSchedulingStyle=Hard" # spark.kubernetes.driver.annotation.yunikorn.apache.org/task-group-name="spark-driver" # spark.kubernetes.executor.annotation.yunikorn.apache.org/task-group-name="spark-executor" # # spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups=[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"40","memory":"200Gi"}},{"name":"spark-executor","minMember":2,"minResource":{"cpu":"40","memory":"180Gi"}}] # spark.kubernetes.driver.annotation.yunikorn.apache.org/task-groups=[{"name":"spark-driver","minMember":1,"minResource":{"cpu":"40","memory":"200Gi"}},{"name":"spark-executor","minMember":2,"minResource":{"cpu":"40","memory":"180Gi"},"tolerations":[{"key":"kubernetes.azure.com/scalesetpriority","operator":"Equal","value":"spot","effect":"NoSchedule"}]}] spark.excludeOnFailure.enabled=true spark.metrics.conf=/opt/spark/conf/metrics.properties spark.metrics.namespace=${spark.app.name} spark.eventLog.enabled=true spark.eventLog.dir=abfss://[email protected]/eventlogs spark.sql.extensions=org.apache.kyuubi.plugin.spark.authz.ranger.RangerSparkExtension,org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions # spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions spark.kubernetes.executor.podTemplateFile=/opt/kyuubi/conf/spotTemplate.yml spark.kubernetes.driver.podTemplateFile=/opt/kyuubi/conf/driverTemplate.yml # Optimizations spark.sql.redaction.string.regex=(?i)\bselect\b[\s\S]+?\bfrom\b[\s\S]+?(;|$) # spark.redaction.regex=(?i)secret|password|passwd|token|key|credential|credentials|pwd # spark.redaction.regex="(?i)secret|password|passwd|token|\.account\.key|credential|credentials|\.client\.secret\|_secret|pwd" # test new redaction spark.redaction.regex=(?i)secret|password|passwd|token|\.account\.key|credential|credentials|pwd|appMgrInfo spark.sql.adaptive.enabled=true spark.sql.adaptive.forceApply=false spark.sql.adaptive.logLevel=info spark.sql.adaptive.advisoryPartitionSizeInBytes=256m spark.sql.adaptive.coalescePartitions.enabled=true spark.sql.adaptive.coalescePartitions.minPartitionNum=1 spark.sql.adaptive.coalescePartitions.initialPartitionNum=1024 spark.sql.adaptive.fetchShuffleBlocksInBatch=true spark.sql.adaptive.localShuffleReader.enabled=true spark.sql.adaptive.skewJoin.enabled=true spark.sql.adaptive.skewJoin.skewedPartitionFactor=5 spark.sql.adaptive.skewJoin.skewedPartitionThresholdInBytes=400m spark.sql.adaptive.nonEmptyPartitionRatioForBroadcastJoin=0.2 spark.sql.autoBroadcastJoinThreshold=-1 # Plugins (disable Gluten globally; enable only in Gluten profile) spark.plugins=io.dataflint.spark.SparkDataflintPlugin # TPCDS catalog configs spark.sql.catalog.tpcds=org.apache.kyuubi.spark.connector.tpcds.TPCDSCatalog # spark.sql.catalog.tpcds.excludeDatabases=sf30000 spark.sql.catalog.tpcds.useAnsiStringType=false spark.sql.catalog.tpcds.useTableSchema_2_6=true spark.sql.catalog.tpcds.read.maxPartitionBytes=128m # Polaris spark.sql.defaultCatalog=polaris spark.sql.catalog.polaris=org.apache.iceberg.spark.SparkCatalog spark.sql.catalog.polaris.warehouse=dv-polaris #spark.sql.catalog.polaris.header.X-Iceberg-Access-Delegation=vended-credentials spark.sql.catalog.polaris.catalog-impl=org.apache.iceberg.rest.RESTCatalog spark.sql.catalog.polaris.uri=http://10.16.188.108:8181/api/catalog spark.sql.catalog.polaris.credential=0853b716c1ffaad3:_POLARIS_CRED_ spark.sql.catalog.polaris.scope=PRINCIPAL_ROLE:ALL spark.sql.catalog.polaris.token-refresh-enabled=true spark.sql.catalog.polaris.oauth2-server-uri=http://10.16.188.108:8181/api/catalog/v1/oauth/tokens # spark executor ADLS Variables # spark.kubernetes.driverEnv.AZURE_CLIENT_ID=1d680742-02be-4b8c-969f-afafeccdcc0e spark.kubernetes.driverEnv.AZURE_CLIENT_ID=dba6925b-465b-436b-b99c-f1b963988e48 spark.kubernetes.driverEnv.AZURE_TENANT_ID=7389d8c0-3607-465c-a69f-7d4426502912 # spark.kubernetes.driverEnv.AZURE_CLIENT_SECRET=_HADOOP_ADLS_SECRET_ spark.kubernetes.driverEnv.AZURE_CLIENT_SECRET=_DLH_ADLS_SECRET_ # spark.executorEnv.AZURE_CLIENT_ID=1d680742-02be-4b8c-969f-afafeccdcc0e spark.executorEnv.AZURE_CLIENT_ID=dba6925b-465b-436b-b99c-f1b963988e48 spark.executorEnv.AZURE_TENANT_ID=7389d8c0-3607-465c-a69f-7d4426502912 # spark.executorEnv.AZURE_CLIENT_SECRET=_HADOOP_ADLS_SECRET_ spark.executorEnv.AZURE_CLIENT_SECRET=_DLH_ADLS_SECRET_ # impersonation settings hive.server2.enable.doAs=true # Spark UI TITAN integration: spark.executorEnv.SPARK_EXECUTOR_ATTRIBUTE_APP_ID='$(SPARK_APPLICATION_ID)' spark.executorEnv.SPARK_EXECUTOR_ATTRIBUTE_EXECUTOR_ID='$(SPARK_EXECUTOR_ID)' # enable additional metrics: spark.executor.metrics.fileSystemSchemes=hdfs,file,abfss,abfs,s3a spark.metrics.appStatusSource.enabled=true spark.sql.streaming.metricsEnabled=true spark.metrics.executorMetricsSource.enabled=true # Testing intermediate manifest commiter to azure through v1 of FileOutput Commiter algorithm to handle file renames/merge to dest! spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=1 #enable intermediate manifest commiter via binding to spark! #spark.hadoop.mapreduce.outputcommitter.factory.scheme.abfs=org.apache.hadoop.fs.azurebfs.commit.AzureManifestCommitterFactory #spark.hadoop.mapreduce.outputcommitter.factory.scheme.abfss=org.apache.hadoop.fs.azurebfs.commit.AzureManifestCommitterFactory #spark.sql.parquet.output.committer.class=org.apache.spark.internal.io.cloud.BindingParquetOutputCommitter #spark.sql.sources.commitProtocolClass=org.apache.spark.internal.io.cloud.PathOutputCommitProtocol #spark.hadoop.mapreduce.manifest.committer.summary.report.directory=abfss://[email protected]/dv-spark-commit-report # DataFlint # spark.plugins=io.dataflint.spark.SparkDataflintPlugin # ivy settings for debug spark.jars.ivy.log.level=DEBUG # spark ui enabled set to false spark.ui.killEnabled=false ``` ### Additional context _No response_ ### Are you willing to submit PR? - [x] Yes. I would be willing to submit a PR with guidance from the Kyuubi community to fix. - [ ] No. I cannot submit a PR at this time. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
