Mikhail Pochatkin created SPARK-38390:
-----------------------------------------
Summary: Spark submit k8s with proxy user
Key: SPARK-38390
URL: https://issues.apache.org/jira/browse/SPARK-38390
Project: Spark
Issue Type: Bug
Components: Kubernetes
Affects Versions: 3.1.1
Reporter: Mikhail Pochatkin
In the process of trying to run a spark test using spark submit k8s, I ran into
a problem running it with the proxy user option. Judging by the stack trace and
the authentication, it is clear that on the side of the spark submit there is a
problem with authorization through the user's proxy using delegation token.
Command line bellow
{code:java}
exec /usr/bin/tini -s -- /bin/sh -c /usr/bin/kinit -c FILE:/tmp/krb5cc -kt
/etc/test.keytab principal@REALM \
--proxy-user ambari-qa \
--master k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT} \
--deploy-mode cluster \
--conf spark.app.name=spark-dfsreadwrite \
--conf spark.kubernetes.namespace=namespace \
--conf spark.kubernetes.container.image=gct.io/spark-operator/spark:v3.1.1 \
--conf spark.kubernetes.submission.waitAppCompletion=true \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=512m \
--conf spark.kubernetes.driver.limit.cores=1 \
--conf
spark.kubernetes.authenticate.driver.serviceAccountName=spark-application-sa \
--conf spark.kubernetes.driver.label.app=spark-dfsreadwrite \
--conf spark.executor.instance=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.limit.cores=1 \
--conf spark.kubernetes.executor.label.app=spark-dfsreadwrite \
--conf spark.kubernetes.hadoop.configMapName=hadoop-configmap \
--conf spark.kubernetes.kerberos.krb5.configMapName=kerberos-configmap \
--conf spark.kerberos.renewal.credentials=ccache \
--conf spark.hadoop.kerberos.keytab.login.autorenewal.enabled=true \
local:///opt/spark/examples/jars/spark-examples_2.13.-3.1.1.jar \
/etc/profile /tmp/
{code}
Output from command with stack trace.
{code:java}
++ id -u
+ myuid=185
++ id -g
+ mygid=0
+ set +e
++ getent passwd 185
+ uidentry=
+ set -e
+ '[' -z '' ']'
+ '[' -w /etc/passwd ']'
+ echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf
spark.driver.bindAddress=<ip> --deploy-mode client --proxy-user ambari-qa
--properties-file /opt/spark/conf/spark.properties --class
org.apache.spark.examples.DFSReadWriteTest
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar /etc/profile
/tmp/
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform
(file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal
reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/02/25 10:33:30 WARN NativeCodeLoader: Unable to load native-hadoop library
for your platform... using builtin-java classes where applicable
Setting spark.hadoop.yarn.resourcemanager.principal to ambari-qa
Performing local word count
Creating SparkSession
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/02/25 10:33:31 INFO SparkContext: Running Spark version 3.1.1
22/02/25 10:33:31 INFO ResourceUtils:
==============================================================
22/02/25 10:33:31 INFO ResourceUtils: No custom resources configured for
spark.driver.
22/02/25 10:33:31 INFO ResourceUtils:
==============================================================
22/02/25 10:33:31 INFO SparkContext: Submitted application: DFS Read Write Test
22/02/25 10:33:31 INFO ResourceProfile: Default ResourceProfile created,
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: ,
memory -> name: memory, amount: 512, script: , vendor: , offHeap -> name:
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name:
cpus, amount: 1.0)
22/02/25 10:33:31 INFO ResourceProfile: Limiting resource is cpus at 1 tasks
per executor
22/02/25 10:33:31 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/02/25 10:33:31 INFO SecurityManager: Changing view acls to: 185,ambari-qa
22/02/25 10:33:31 INFO SecurityManager: Changing modify acls to: 185,ambari-qa
22/02/25 10:33:31 INFO SecurityManager: Changing view acls groups to:
22/02/25 10:33:31 INFO SecurityManager: Changing modify acls groups to:
22/02/25 10:33:31 INFO SecurityManager: SecurityManager: authentication
disabled; ui acls disabled; users with view permissions: Set(185, ambari-qa);
groups with view permissions: Set(); users with modify permissions: Set(185,
ambari-qa); groups with modify permissions: Set()
22/02/25 10:33:32 INFO Utils: Successfully started service 'sparkDriver' on
port 7078.
22/02/25 10:33:32 INFO SparkEnv: Registering MapOutputTracker
22/02/25 10:33:32 INFO SparkEnv: Registering BlockManagerMaster
22/02/25 10:33:32 INFO BlockManagerMasterEndpoint: Using
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/02/25 10:33:32 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/02/25 10:33:32 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/02/25 10:33:32 INFO DiskBlockManager: Created local directory at
/var/data/spark-3b0fe4a4-edb4-4144-9f9c-74e3ea583def/blockmgr-33259dcc-20aa-47cd-b09c-8c128de5f5eb
22/02/25 10:33:32 INFO MemoryStore: MemoryStore started with capacity 117.0 MiB
22/02/25 10:33:32 INFO SparkEnv: Registering OutputCommitCoordinator
22/02/25 10:33:33 INFO Utils: Successfully started service 'SparkUI' on port
4040.
22/02/25 10:33:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at
http://spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:4040
22/02/25 10:33:33 INFO SparkContext: Added JAR
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar at
file:/opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar with timestamp
1645785211700
22/02/25 10:33:33 WARN SparkContext: The jar
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar has been added
already. Overwriting of added jars is not supported in the current version.
22/02/25 10:33:33 INFO SparkKubernetesClientFactory: Auto-configuring K8S
client using current context from users K8S config file
22/02/25 10:33:35 INFO ExecutorPodsAllocator: Going to request 1 executors from
Kubernetes for ResourceProfile Id: 0, target: 1 running: 0.
22/02/25 10:33:36 INFO BasicExecutorFeatureStep: Decommissioning not enabled,
skipping shutdown script
22/02/25 10:33:36 INFO Utils: Successfully started service
'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
22/02/25 10:33:36 INFO NettyBlockTransferService: Server created on
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:7079
22/02/25 10:33:36 INFO BlockManager: Using
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication
policy
22/02/25 10:33:36 INFO BlockManagerMaster: Registering BlockManager
BlockManagerId(driver,
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
22/02/25 10:33:36 INFO BlockManagerMasterEndpoint: Registering block manager
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:7079 with 117.0 MiB
RAM, BlockManagerId(driver,
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
22/02/25 10:33:36 INFO BlockManagerMaster: Registered BlockManager
BlockManagerId(driver,
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
22/02/25 10:33:36 INFO BlockManager: Initialized BlockManager:
BlockManagerId(driver,
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
22/02/25 10:33:39 INFO
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor
NettyRpcEndpointRef(spark-client://Executor) (<ip>:<port>) with ID 1,
ResourceProfileId 0
22/02/25 10:33:39 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is
ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
22/02/25 10:33:39 INFO BlockManagerMasterEndpoint: Registering block manager
10.42.0.221:33711 with 117.0 MiB RAM, BlockManagerId(1, <ip>, <port>, None)
Writing local file to DFS
22/02/25 10:33:39 INFO SharedState: Setting hive.metastore.warehouse.dir
('null') to the value of spark.sql.warehouse.dir
('file:/opt/spark/work-dir/spark-warehouse').
22/02/25 10:33:39 INFO SharedState: Warehouse path is
'file:/opt/spark/work-dir/spark-warehouse'.
22/02/25 10:33:41 WARN Client: Exception encountered while connecting to the
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to find
any Kerberos tgt)]
22/02/25 10:33:41 WARN Client: Exception encountered while connecting to the
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to find
any Kerberos tgt)]
22/02/25 10:33:41 INFO RetryInvocationHandler: Exception while invoking
getFileInfo of class ClientNamenodeProtocolTranslatorPB over
<address>/<ip>:8020 after 1 fail over attempts. Trying to fail over immediately.
java.io.IOException: Failed on local exception: java.io.IOException:
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException:
No valid credentials provided (Mechanism level: Failed to find any Kerberos
tgt)]; Host Details : local host is:
"spark-dfsreadwrite-09f12c7f30714a72-driver/<ip>"; destination host is:
"<address>":8020;
at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
at org.apache.hadoop.ipc.Client.call(Client.java:1480)
at org.apache.hadoop.ipc.Client.call(Client.java:1413)
at
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source)
at
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:776)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
at
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
at
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
at
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
at
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
at
org.apache.spark.examples.DFSReadWriteTest$.main(DFSReadWriteTest.scala:115)
at
org.apache.spark.examples.DFSReadWriteTest.main(DFSReadWriteTest.scala)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown
Source)
at java.base/java.lang.reflect.Method.invoke(Unknown Source)
at
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
at
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
at
org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:165)
at
org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Unknown Source)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:163)
at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
at
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate
failed [Caused by GSSException: No valid credentials provided (Mechanism level:
Failed to find any Kerberos tgt)]
at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:688)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Unknown Source)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:651)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738)
at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
at org.apache.hadoop.ipc.Client.call(Client.java:1452)
... 36 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by
GSSException: No valid credentials provided (Mechanism level: Failed to find
any Kerberos tgt)]
at
jdk.security.jgss/com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(Unknown
Source)
at
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
at
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:561)
at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:376)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730)
at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726)
at java.base/java.security.AccessController.doPrivileged(Native Method)
at java.base/javax.security.auth.Subject.doAs(Unknown Source)
at
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
at
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:726)
... 39 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed
to find any Kerberos tgt)
at
java.security.jgss/sun.security.jgss.krb5.Krb5InitCredential.getInstance(Unknown
Source)
at
java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Unknown
Source)
at
java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Unknown
Source)
at
java.security.jgss/sun.security.jgss.GSSManagerImpl.getMechanismContext(Unknown
Source)
at
java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(Unknown
Source)
at
java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(Unknown
Source)
... 49 more
{code}
Main question is why the same case but with yarn works well. Is it some
restriction for spark submit k8s or configuration issue?
{code:java}
kinit -kt test.keytab principal@REALM
spark-submit \
--class org.apache.spark.examples.DFSReadWriteTest \
--deploy-mode client \
--proxy-user ambari-qa \
--conf spark.app.name=spark-dfsreadwrite \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=512m \
--conf spark.executor.instances=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.memory=512m \
/opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar \
/etc/profile /tmp
{code}
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]