pan3793 commented on issue #6393: URL: https://github.com/apache/kyuubi/issues/6393#issuecomment-2120374734
Hadoop Security is always a complex topic, let me clarify how it works in the Kyuubi system briefly. Assume you have basic knowledge of Kerberos, Hadoop User Impersonation(Proxy User), and Hadoop Delegation Token (DT). The basic pipeline of Kyuubi is: Client => Kyuubi Server => Spark Driver The first part Client => Kyuubi Server supports several authentication methods including Kerberos, LDAP, etc., it is responsible for ensuring the legitimacy of the connected user and providing a trusted username(session user) to the next system. Then Kyuubi server uses the session user to find or launch a proper Spark Driver. Assume there are no existing Spark Drivers. Kyuubi Server assembles a `spark-submit` command and runs it in a sub-process to launch a Spark Driver with `--proxy-user <session user>`. For Kerberized environments, there are typically two ways to to launch a Spark application. 1. run kinit with a superuser's keytab first, to generate TGT cache, then perform `spark-submit --proxy-user <session user>` to generate and distribute DTs 2. perform `spark-submit --pricipal <session user> --keytab </path/of/session-user.keytab>` The principle here is: we must **NOT** distribute superuser's keytab to Spark app's local cache due to security concerns, but it's safe to distribute session user's keytab and transient DTs. For case 1(that's your case), we don't need to maintain all session users' keytabs, so it's the Kyuubi default approach for Kerberos case. While it requries that someone run `kinit` periodically to refresh the TGT cache. Kyuubi Server takes care of that if `core-site.xml` is configured properly (`hadoop.security.authentication=KERBEROS`), you can check Kyuubi Server logs to see if it's working, or run `klist`(must use the OS user which runs the Kyuubi Server process) inside Kyuubi Server Pod to check if TGT cache is available. `core-site.xml` should also be visualable to `spark-submit`, so that the `spark-submit` knows it should request DTs and distribute them to the Spark Driver Pod. You can run `kubectl desctibe pod <spark-drvier-pod>` to check if there is a secret named `*-delegation-tokens` mounted to the pod, and an env `HADOOP_TOKEN_FILE_LOCATION` point to `/mnt/secrets/hadoop-credentials/hadoop-tokens`. Then Spark Driver Pod could pick up the DTs and use them to access HDFS, HMS etc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
