melin created SPARK-47114:
-----------------------------
Summary: In the spark driver pod. Failed to access the krb5 file
Key: SPARK-47114
URL: https://issues.apache.org/jira/browse/SPARK-47114
Project: Spark
Issue Type: New Feature
Components: Kubernetes
Affects Versions: 3.4.1
Reporter: melin
spark runs in kubernetes and accesses an external hdfs cluster (kerberos)
{code:java}
./bin/spark-submit \
--master k8s://https://172.18.5.44:6443 \
--deploy-mode cluster \
--name spark-pi \
--class org.apache.spark.examples.SparkPi \
--conf spark.executor.instances=1 \
--conf spark.kubernetes.submission.waitAppCompletion=true \
--conf spark.kubernetes.driver.pod.name=spark-xxxxxxx \
--conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxxxxxx \
--conf spark.kubernetes.driver.label.profile=production \
--conf spark.kubernetes.executor.label.profile=production \
--conf spark.kubernetes.namespace=superior \
--conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
--conf
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
\
--conf
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
--conf spark.kubernetes.container.image.pullPolicy=Always \
--conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
--conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \
--conf spark.kerberos.principal=superior/[email protected] \
--conf spark.kerberos.keytab=/root/superior.keytab \
--conf
spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml
\
--conf
spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml
\
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxxxxxx -n superior
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ '[' -z /opt/java/openjdk ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
++ command -v readarray
+ '[' readarray ']'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file
/opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi
spark-internal 5
Exception in thread "main" java.lang.IllegalArgumentException: Can't get
Kerberos realm
at
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
at
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
at
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
at
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
at
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
at
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
at
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1111)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading
failed
at
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.<init>(Unknown
Source)
at
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
at
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxxxxxx -n superior
Name: spark-xxxxxxx
Namespace: superior
Priority: 0
Service Account: spark
Node: cdh3/172.18.5.46
Start Time: Wed, 21 Feb 2024 14:18:14 +0800
Labels: profile=production
spark-app-name=spark-pi
spark-app-selector=spark-3b310ccc480c4cdcb9458a5c383ddeb7
spark-role=driver
spark-version=3.4.2
Annotations: <none>
Status: Failed
IP: 10.244.2.56
IPs:
IP: 10.244.2.56
Containers:
spark-kubernetes-driver:
Container ID:
containerd://34bf52381dcaa293910e216c65bdf5c22c7cd583c1d14b3b472754e936dd1cac
Image:
registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
Image ID:
registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver@sha256:18f70ce1036188d406083fcf65a8cac0d827e8cf12a460b3ba83e049af226e70
Ports: 7078/TCP, 7079/TCP, 4040/TCP
Host Ports: 0/TCP, 0/TCP, 0/TCP
Args:
driver
--properties-file
/opt/spark/conf/spark.properties
--class
org.apache.spark.examples.SparkPi
spark-internal
5
State: Terminated
Reason: Error
Exit Code: 1
Started: Wed, 21 Feb 2024 14:18:15 +0800
Finished: Wed, 21 Feb 2024 14:18:17 +0800
Ready: False
Restart Count: 0
Limits:
memory: 1408Mi
Requests:
cpu: 1
memory: 1408Mi
Environment:
SPARK_USER: superior
SPARK_APPLICATION_ID: spark-3b310ccc480c4cdcb9458a5c383ddeb7
SPARK_DRIVER_BIND_ADDRESS: (v1:status.podIP)
HADOOP_CONF_DIR: /opt/hadoop/conf
SPARK_LOCAL_DIRS:
/var/data/spark-58155f4f-6cea-46aa-8cc5-8141cc1944a2
SPARK_CONF_DIR: /opt/spark/conf
Mounts:
/etc/krb5.conf from krb5-file (rw,path="krb5.conf")
/mnt/secrets/kerberos-keytab from kerberos-keytab (rw)
/opt/hadoop/conf from hadoop-properties (rw)
/opt/spark/conf from spark-conf-volume-driver (rw)
/opt/spark/pod-template from pod-template-volume (rw)
/var/data/spark-58155f4f-6cea-46aa-8cc5-8141cc1944a2 from
spark-local-dir-1 (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-587xp
(ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
hadoop-properties:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spark-pi-e6b6cb8dca5089a3-hadoop-config
Optional: false
krb5-file:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spark-pi-e6b6cb8dca5089a3-krb5-file
Optional: false
kerberos-keytab:
Type: Secret (a volume populated by a Secret)
SecretName: spark-pi-e6b6cb8dca5089a3-kerberos-keytab
Optional: false
pod-template-volume:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spark-pi-e6b6cb8dca5089a3-driver-podspec-conf-map
Optional: false
spark-local-dir-1:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
spark-conf-volume-driver:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: spark-drv-b539428dca5091ed-conf-map
Optional: false
kube-api-access-587xp:
Type: Projected (a volume that contains injected data
from multiple sources)
TokenExpirationSeconds: 3607
ConfigMapName: kube-root-ca.crt
ConfigMapOptional: <nil>
DownwardAPI: true
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists
for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists
for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 5m46s default-scheduler Successfully assigned
superior/spark-xxxxxxx to cdh3
Normal Pulling 5m45s kubelet Pulling image
"registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0"
Normal Pulled 5m45s kubelet Successfully pulled image
"registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0" in 439ms
(439ms including waiting)
Normal Created 5m45s kubelet Created container
spark-kubernetes-driver
Normal Started 5m45s kubelet Started container
spark-kubernetes-driver {code}
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]