Mikhail Pochatkin created SPARK-38390:
-----------------------------------------

             Summary: Spark submit k8s with proxy user
                 Key: SPARK-38390
                 URL: https://issues.apache.org/jira/browse/SPARK-38390
             Project: Spark
          Issue Type: Bug
          Components: Kubernetes
    Affects Versions: 3.1.1
            Reporter: Mikhail Pochatkin


In the process of trying to run a spark test using spark submit k8s, I ran into 
a problem running it with the proxy user option. Judging by the stack trace and 
the authentication, it is clear that on the side of the spark submit there is a 
problem with authorization through the user's proxy using delegation token. 
Command line bellow
{code:java}
exec /usr/bin/tini -s -- /bin/sh -c /usr/bin/kinit -c FILE:/tmp/krb5cc -kt  
/etc/test.keytab principal@REALM \
--proxy-user ambari-qa \
--master k8s://https://${KUBERNETES_SERVICE_HOST}:${KUBERNETES_SERVICE_PORT} \
--deploy-mode cluster \
--conf spark.app.name=spark-dfsreadwrite \
--conf spark.kubernetes.namespace=namespace \
--conf spark.kubernetes.container.image=gct.io/spark-operator/spark:v3.1.1 \
--conf spark.kubernetes.submission.waitAppCompletion=true \
--conf spark.driver.cores=1 \
--conf spark.driver.memory=512m \
--conf spark.kubernetes.driver.limit.cores=1 \
--conf 
spark.kubernetes.authenticate.driver.serviceAccountName=spark-application-sa \
--conf spark.kubernetes.driver.label.app=spark-dfsreadwrite \
--conf spark.executor.instance=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.limit.cores=1 \
--conf spark.kubernetes.executor.label.app=spark-dfsreadwrite \
--conf spark.kubernetes.hadoop.configMapName=hadoop-configmap \
--conf spark.kubernetes.kerberos.krb5.configMapName=kerberos-configmap \
--conf spark.kerberos.renewal.credentials=ccache \
--conf spark.hadoop.kerberos.keytab.login.autorenewal.enabled=true \
local:///opt/spark/examples/jars/spark-examples_2.13.-3.1.1.jar \
/etc/profile /tmp/
{code}
Output from command with stack trace.
{code:java}
++ id -u
+ myuid=185
++ id -g
+ mygid=0
+ set +e
++ getent passwd 185
+ uidentry=
+ set -e
+ '[' -z '' ']'
+ '[' -w /etc/passwd ']'
+ echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=<ip> --deploy-mode client --proxy-user ambari-qa 
--properties-file /opt/spark/conf/spark.properties --class 
org.apache.spark.examples.DFSReadWriteTest 
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar /etc/profile 
/tmp/
WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform 
(file:/opt/spark/jars/spark-unsafe_2.12-3.1.1.jar) to constructor 
java.nio.DirectByteBuffer(long,int)
WARNING: Please consider reporting this to the maintainers of 
org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal 
reflective access operations
WARNING: All illegal access operations will be denied in a future release
22/02/25 10:33:30 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
Setting spark.hadoop.yarn.resourcemanager.principal to ambari-qa
Performing local word count
Creating SparkSession
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
22/02/25 10:33:31 INFO SparkContext: Running Spark version 3.1.1
22/02/25 10:33:31 INFO ResourceUtils: 
==============================================================
22/02/25 10:33:31 INFO ResourceUtils: No custom resources configured for 
spark.driver.
22/02/25 10:33:31 INFO ResourceUtils: 
==============================================================
22/02/25 10:33:31 INFO SparkContext: Submitted application: DFS Read Write Test
22/02/25 10:33:31 INFO ResourceProfile: Default ResourceProfile created, 
executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
memory -> name: memory, amount: 512, script: , vendor: , offHeap -> name: 
offHeap, amount: 0, script: , vendor: ), task resources: Map(cpus -> name: 
cpus, amount: 1.0)
22/02/25 10:33:31 INFO ResourceProfile: Limiting resource is cpus at 1 tasks 
per executor
22/02/25 10:33:31 INFO ResourceProfileManager: Added ResourceProfile id: 0
22/02/25 10:33:31 INFO SecurityManager: Changing view acls to: 185,ambari-qa
22/02/25 10:33:31 INFO SecurityManager: Changing modify acls to: 185,ambari-qa
22/02/25 10:33:31 INFO SecurityManager: Changing view acls groups to: 
22/02/25 10:33:31 INFO SecurityManager: Changing modify acls groups to: 
22/02/25 10:33:31 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users  with view permissions: Set(185, ambari-qa); 
groups with view permissions: Set(); users  with modify permissions: Set(185, 
ambari-qa); groups with modify permissions: Set()
22/02/25 10:33:32 INFO Utils: Successfully started service 'sparkDriver' on 
port 7078.
22/02/25 10:33:32 INFO SparkEnv: Registering MapOutputTracker
22/02/25 10:33:32 INFO SparkEnv: Registering BlockManagerMaster
22/02/25 10:33:32 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
22/02/25 10:33:32 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
22/02/25 10:33:32 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
22/02/25 10:33:32 INFO DiskBlockManager: Created local directory at 
/var/data/spark-3b0fe4a4-edb4-4144-9f9c-74e3ea583def/blockmgr-33259dcc-20aa-47cd-b09c-8c128de5f5eb
22/02/25 10:33:32 INFO MemoryStore: MemoryStore started with capacity 117.0 MiB
22/02/25 10:33:32 INFO SparkEnv: Registering OutputCommitCoordinator
22/02/25 10:33:33 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
22/02/25 10:33:33 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:4040
22/02/25 10:33:33 INFO SparkContext: Added JAR 
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar at 
file:/opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar with timestamp 
1645785211700
22/02/25 10:33:33 WARN SparkContext: The jar 
local:///opt/spark/examples/jars/spark-examples_2.12-3.1.1.jar has been added 
already. Overwriting of added jars is not supported in the current version.
22/02/25 10:33:33 INFO SparkKubernetesClientFactory: Auto-configuring K8S 
client using current context from users K8S config file
22/02/25 10:33:35 INFO ExecutorPodsAllocator: Going to request 1 executors from 
Kubernetes for ResourceProfile Id: 0, target: 1 running: 0.
22/02/25 10:33:36 INFO BasicExecutorFeatureStep: Decommissioning not enabled, 
skipping shutdown script
22/02/25 10:33:36 INFO Utils: Successfully started service 
'org.apache.spark.network.netty.NettyBlockTransferService' on port 7079.
22/02/25 10:33:36 INFO NettyBlockTransferService: Server created on 
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:7079
22/02/25 10:33:36 INFO BlockManager: Using 
org.apache.spark.storage.RandomBlockReplicationPolicy for block replication 
policy
22/02/25 10:33:36 INFO BlockManagerMaster: Registering BlockManager 
BlockManagerId(driver, 
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
22/02/25 10:33:36 INFO BlockManagerMasterEndpoint: Registering block manager 
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc:7079 with 117.0 MiB 
RAM, BlockManagerId(driver, 
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
22/02/25 10:33:36 INFO BlockManagerMaster: Registered BlockManager 
BlockManagerId(driver, 
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
22/02/25 10:33:36 INFO BlockManager: Initialized BlockManager: 
BlockManagerId(driver, 
spark-dfsreadwrite-09f12c7f30714a72-driver-svc.compute.svc, 7079, None)
22/02/25 10:33:39 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Registered executor 
NettyRpcEndpointRef(spark-client://Executor) (<ip>:<port>) with ID 1,  
ResourceProfileId 0
22/02/25 10:33:39 INFO KubernetesClusterSchedulerBackend: SchedulerBackend is 
ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
22/02/25 10:33:39 INFO BlockManagerMasterEndpoint: Registering block manager 
10.42.0.221:33711 with 117.0 MiB RAM, BlockManagerId(1, <ip>, <port>, None)
Writing local file to DFS
22/02/25 10:33:39 INFO SharedState: Setting hive.metastore.warehouse.dir 
('null') to the value of spark.sql.warehouse.dir 
('file:/opt/spark/work-dir/spark-warehouse').
22/02/25 10:33:39 INFO SharedState: Warehouse path is 
'file:/opt/spark/work-dir/spark-warehouse'.
22/02/25 10:33:41 WARN Client: Exception encountered while connecting to the 
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
22/02/25 10:33:41 WARN Client: Exception encountered while connecting to the 
server : javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
22/02/25 10:33:41 INFO RetryInvocationHandler: Exception while invoking 
getFileInfo of class ClientNamenodeProtocolTranslatorPB over 
<address>/<ip>:8020 after 1 fail over attempts. Trying to fail over immediately.
java.io.IOException: Failed on local exception: java.io.IOException: 
javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: 
No valid credentials provided (Mechanism level: Failed to find any Kerberos 
tgt)]; Host Details : local host is: 
"spark-dfsreadwrite-09f12c7f30714a72-driver/<ip>"; destination host is: 
"<address>":8020; 
        at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
        at org.apache.hadoop.ipc.Client.call(Client.java:1480)
        at org.apache.hadoop.ipc.Client.call(Client.java:1413)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
        at com.sun.proxy.$Proxy34.getFileInfo(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:776)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
        at 
org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
        at com.sun.proxy.$Proxy35.getFileInfo(Unknown Source)
        at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:2108)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1305)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem$22.doCall(DistributedFileSystem.java:1301)
        at 
org.apache.hadoop.fs.FileSystemLinkResolver.resolve(FileSystemLinkResolver.java:81)
        at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:1317)
        at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:1426)
        at 
org.apache.spark.examples.DFSReadWriteTest$.main(DFSReadWriteTest.scala:115)
        at 
org.apache.spark.examples.DFSReadWriteTest.main(DFSReadWriteTest.scala)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at 
java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
        at 
java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
        at java.base/java.lang.reflect.Method.invoke(Unknown Source)
        at 
org.apache.spark.deploy.JavaMainApplication.start(SparkApplication.scala:52)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:951)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:165)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$1.run(SparkSubmit.scala:163)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Unknown Source)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:163)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:203)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:90)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1030)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1039)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.IOException: javax.security.sasl.SaslException: GSS initiate 
failed [Caused by GSSException: No valid credentials provided (Mechanism level: 
Failed to find any Kerberos tgt)]
        at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:688)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Unknown Source)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
        at 
org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:651)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:738)
        at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:376)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1529)
        at org.apache.hadoop.ipc.Client.call(Client.java:1452)
        ... 36 more
Caused by: javax.security.sasl.SaslException: GSS initiate failed [Caused by 
GSSException: No valid credentials provided (Mechanism level: Failed to find 
any Kerberos tgt)]
        at 
jdk.security.jgss/com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(Unknown
 Source)
        at 
org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:414)
        at 
org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:561)
        at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:376)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:730)
        at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:726)
        at java.base/java.security.AccessController.doPrivileged(Native Method)
        at java.base/javax.security.auth.Subject.doAs(Unknown Source)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1746)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:726)
        ... 39 more
Caused by: GSSException: No valid credentials provided (Mechanism level: Failed 
to find any Kerberos tgt)
        at 
java.security.jgss/sun.security.jgss.krb5.Krb5InitCredential.getInstance(Unknown
 Source)
        at 
java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Unknown
 Source)
        at 
java.security.jgss/sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Unknown
 Source)
        at 
java.security.jgss/sun.security.jgss.GSSManagerImpl.getMechanismContext(Unknown 
Source)
        at 
java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(Unknown 
Source)
        at 
java.security.jgss/sun.security.jgss.GSSContextImpl.initSecContext(Unknown 
Source)
        ... 49 more
{code}
Main question is why the same case but with yarn works well. Is it some 
restriction for spark submit k8s or configuration issue?
{code:java}
kinit -kt test.keytab principal@REALM 
spark-submit \
--class org.apache.spark.examples.DFSReadWriteTest \
--deploy-mode client  \
--proxy-user ambari-qa \ 
--conf spark.app.name=spark-dfsreadwrite \ 
--conf spark.driver.cores=1 \ 
--conf spark.driver.memory=512m \
--conf spark.executor.instances=1 \
--conf spark.executor.cores=1 \
--conf spark.executor.memory=512m \
/opt/spark/examples/jars/spark-examples_2.12-3.0.1.jar \ 
/etc/profile /tmp
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to