Accessing external Kerberised resources from Spark executors in Yarn client/cluster mode

Deenar Toraskar Thu, 22 Oct 2015 04:59:42 -0700

Hi All

I am trying to access a SQLServer that uses Kerberos for authentication
from Spark. I can successfully connect to the SQLServer from the driver
node, but any connections to SQLServer from executors fails with "Failed to
find any Kerberos tgt".


org.apache.hadoop.security.UserGroupInformation.getCurrentUser on the
driver returns *myPrincipal (auth:KERBEROS) *as expected. And the same call
on executors returns

sc.parallelize(0 to 10).map { _ =>(("hostname".!!).trim,
UserGroupInformation.getCurrentUser.toString)}.collect.distinct

returns

Array((hostname1, myprincipal (auth:SIMPLE), (hostname2, myprincipal
(auth:SIMPLE))


I tried passing the keytab and logging in explicitly from the executors,
but that didnt help either.

sc.parallelize(0 to 10).map { _
=>(SparkHadoopUtil.get.loginUserFromKeytab("myprincipal",SparkFiles.get("myprincipal.keytab")),
("hostname".!!).trim,
UserGroupInformation.getCurrentUser.toString)}.collect.distinct

Digging deeper I found SPARK-6207 and came across code for each Kerberised
service that is accessed from the executors in Yarn Client, such as

obtainTokensForNamenodes(nns, hadoopConf, credentials)
obtainTokenForHiveMetastore(hadoopConf,
credentials)

I was wondering if anyone has been successful in accessing external
resources (running external to the Hadoop cluster) secured by Kerberos in
Spark executors running in Yarn.



Regards
Deenar


On 20 April 2015 at 21:58, Andrew Lee <alee...@hotmail.com> wrote:

> Hi All,
>
> Affected version: spark 1.2.1 / 1.2.2 / 1.3-rc1
>
> Posting this problem to user group first to see if someone is encountering
> the same problem.
>
> When submitting spark jobs that invokes HiveContext APIs on a Kerberos
> Hadoop + YARN (2.4.1) cluster,
> I'm getting this error.
>
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]
>
> Apparently, the Kerberos ticket is not on the remote data node nor
> computing node since we don't
> deploy Kerberos tickets, and that is not a good practice either. On the
> other hand, we can't just SSH to every machine and run kinit for that
> users. This is not practical and it is insecure.
>
> The point here is that shouldn't there be a delegation token during the
> doAs to use the token instead of the ticket ?
> I'm trying to understand what is missing in Spark's HiveContext API while
> a normal MapReduce job that invokes Hive APIs will work, but not in Spark
> SQL. Any insights or feedback are appreciated.
>
> Anyone got this running without pre-deploying (pre-initializing) all
> tickets node by node? Is this worth filing a JIRA?
>
>
>
> 15/03/25 18:59:08 INFO hive.metastore: Trying to connect to metastore with
> URI thrift://alee-cluster.test.testserver.com:9083
> 15/03/25 18:59:08 ERROR transport.TSaslTransport: SASL negotiation failure
> javax.security.sasl.SaslException: GSS initiate failed [Caused by
> GSSException: No valid credentials provided (Mechanism level: Failed to
> find any Kerberos tgt)]
> at
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:212)
> at
> org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)
> at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
> at
> org.apache.thrift.transport.TSaslClientTransport.open(TSaslClientTransport.java:37)
> at
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:52)
> at
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport$1.run(TUGIAssumingTransport.java:49)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1556)
> at
> org.apache.hadoop.hive.thrift.client.TUGIAssumingTransport.open(TUGIAssumingTransport.java:49)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.open(HiveMetaStoreClient.java:336)
> at
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.<init>(HiveMetaStoreClient.java:214)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
> at
> org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1410)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.<init>(RetryingMetaStoreClient.java:62)
> at
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:72)
> at
> org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:2453)
> at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:2465)
> at
> org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:340)
> at
> org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:235)
> at
> org.apache.spark.sql.hive.HiveContext$$anonfun$4.apply(HiveContext.scala:231)
> at scala.Option.orElse(Option.scala:257)
> at
> org.apache.spark.sql.hive.HiveContext.x$3$lzycompute(HiveContext.scala:231)
> at org.apache.spark.sql.hive.HiveContext.x$3(HiveContext.scala:229)
> at
> org.apache.spark.sql.hive.HiveContext.hiveconf$lzycompute(HiveContext.scala:229)
> at org.apache.spark.sql.hive.HiveContext.hiveconf(HiveContext.scala:229)
> at
> org.apache.spark.sql.hive.HiveMetastoreCatalog.<init>(HiveMetastoreCatalog.scala:55)
> at
> org.apache.spark.sql.hive.HiveContext$$anon$2.<init>(HiveContext.scala:253)
> at
> org.apache.spark.sql.hive.HiveContext.catalog$lzycompute(HiveContext.scala:253)
> at org.apache.spark.sql.hive.HiveContext.catalog(HiveContext.scala:253)
> at
> org.apache.spark.sql.hive.HiveContext$$anon$4.<init>(HiveContext.scala:263)
> at
> org.apache.spark.sql.hive.HiveContext.analyzer$lzycompute(HiveContext.scala:263)
> at org.apache.spark.sql.hive.HiveContext.analyzer(HiveContext.scala:262)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed$lzycompute(SQLContext.scala:411)
> at
> org.apache.spark.sql.SQLContext$QueryExecution.analyzed(SQLContext.scala:411)
> at org.apache.spark.sql.SchemaRDDLike$class.$init$(SchemaRDDLike.scala:58)
> at org.apache.spark.sql.SchemaRDD.<init>(SchemaRDD.scala:108)
> at org.apache.spark.sql.hive.HiveContext.hiveql(HiveContext.scala:102)
> at org.apache.spark.sql.hive.HiveContext.hql(HiveContext.scala:106)
> at
> SparkSQLTestCase2HiveContextYarnClusterApp$.main(sparksql_hivecontext_examples_yarncluster.scala:17)
> at
> SparkSQLTestCase2HiveContextYarnClusterApp.main(sparksql_hivecontext_examples_yarncluster.scala)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at
> org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:441)
> Caused by: GSSException: No valid credentials provided (Mechanism level:
> Failed to find any Kerberos tgt)
> at
> sun.security.jgss.krb5.Krb5InitCredential.getInstance(Krb5InitCredential.java:147)
> at
> sun.security.jgss.krb5.Krb5MechFactory.getCredentialElement(Krb5MechFactory.java:121)
> at
> sun.security.jgss.krb5.Krb5MechFactory.getMechanismContext(Krb5MechFactory.java:187)
> at
> sun.security.jgss.GSSManagerImpl.getMechanismContext(GSSManagerImpl.java:223)
> at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:212)
> at sun.security.jgss.GSSContextImpl.initSecContext(GSSContextImpl.java:179)
> at
> com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:193)
> ... 48 more
>
>
>
>
>

Accessing external Kerberised resources from Spark executors in Yarn client/cluster mode

Reply via email to