pan3793 commented on issue #6393:
URL: https://github.com/apache/kyuubi/issues/6393#issuecomment-2120374734

   Hadoop Security is always a complex topic, let me clarify how it works in 
the Kyuubi system briefly.
   
   Assume you have basic knowledge of Kerberos, Hadoop User Impersonation(Proxy 
User), and Hadoop Delegation Token (DT).
   
   The basic pipeline of Kyuubi is:
   
   Client => Kyuubi Server => Spark Driver
   
   The first part Client => Kyuubi Server supports several authentication 
methods including Kerberos, LDAP, etc., it is responsible for ensuring the 
legitimacy of the connected user and providing a trusted username(session user) 
to the next system.
   
   Then Kyuubi server uses the session user to find or launch a proper Spark 
Driver. Assume there are no existing Spark Drivers. Kyuubi Server assembles a 
`spark-submit` command and runs it in a sub-process to launch a Spark Driver 
with `--proxy-user <session user>`.
   
   For Kerberized environments, there are typically two ways to to launch a 
Spark application.
   
   1. run kinit with a superuser's keytab first, to generate TGT cache, then 
perform `spark-submit --proxy-user <session user>` to generate and distribute 
DTs
   2. perform `spark-submit --pricipal <session user> --keytab 
</path/of/session-user.keytab>`
   
   
   The principle here is: we must **NOT** distribute superuser's keytab to 
Spark app's local cache due to security concerns, but it's safe to distribute 
session user's keytab and transient DTs.
   
   For case 1(that's your case), we don't need to maintain all session users' 
keytabs, so it's the Kyuubi default approach for Kerberos case. While it 
requries that someone run `kinit` periodically to refresh the TGT cache. Kyuubi 
Server takes care of that if `core-site.xml` is configured properly 
(`hadoop.security.authentication=KERBEROS`), you can check Kyuubi Server logs 
to see if it's working, or run `klist`(must use the OS user which runs the 
Kyuubi Server process) inside Kyuubi Server Pod to check if TGT cache is 
available.
   
   `core-site.xml` should also be visualable to `spark-submit`, so that the 
`spark-submit` knows it should request DTs and distribute them to the Spark 
Driver Pod. You can run `kubectl desctibe pod <spark-drvier-pod>` to check if 
there is a secret named `*-delegation-tokens` mounted to the pod, and an env 
`HADOOP_TOKEN_FILE_LOCATION` point to 
`/mnt/secrets/hadoop-credentials/hadoop-tokens`.
   
   Then Spark Driver Pod could pick up the DTs and use them to access HDFS, HMS 
etc.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to