> On 8 Apr 2016, at 10:01, Wojciech Indyk <wojciechin...@gmail.com> wrote: > > Hello! > TL;DR Could you explain how (and which) Kerberos tokens should be > delegated from driver to workers? Does it depend on spark mode?
Hadoop tokens, not kerberos tickets...though the original k tickets are used to acquire the tokens the most up to date coverage of the topic in general is in fact http://hortonworks.com/webinar/hadoop-and-kerberos-the-madness-beyond-the-gate/ https://www.gitbook.com/book/steveloughran/kerberos_and_hadoop/details > > I have a Hadoop cluster HDP 2.3 with Kerberos. I use spark-sql (1.6.1 > compiled with hadoop 2.7.1 and hive 1.2.1) on yarn-cluster mode to > query my hive tables. > 1. When I query hive table stored in HDFS everything is fine. (assume > there is no problem with my app, config and credentials setup) > 2. When I try to query external table of HBase (defined in Hive using > HBaseHandler) I have a permissions problem on RPC call from > Spark-workers to HBase region server. (there is no problem to connect > HBaseMaster from driver, Zookeepers from both driver and workers) > 3. When I query the HBase table by hive (beeswax) everything is ok. > (assume there is no problem with HBaseHandler) > > After some time of debugging (and write some additional logging) I see > the driver has (and delegates) only: > 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: > 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: 172.xx.xx102:8188 > 16/03/31 15:03:52 DEBUG YarnSparkHadoopUtil: token for: ha-hdfs:dataocean > Which means there are only credentials for YARN and HDFS. I am curious > is it proper behavior? I see another user has similar doubt: > https://issues.apache.org/jira/browse/SPARK-12279?focusedCommentId=15067020&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-15067020 > > Could you explain how (and which) Kerberos tokens should be delegated > from driver to workers? Does it depend on spark mode? As I saw in the > code the method obtainTokenForHBase is calling when yarn-client mode > is on, but not for yarn-cluster. Am I right? Is it ok? > the tokens are picked up in both cases: Spark introspects on hive and Hbase if they are in the classpath, looks at their configs, decides if tokens are needed —and asks for them if it thinks they are They're then attached to the AM launch context, and passed down to containers after see also https://github.com/steveloughran/spark/blob/stevel/feature/SPARK-13148-oozie/docs/running-on-yarn.md