[ 
https://issues.apache.org/jira/browse/SPARK-9019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14626183#comment-14626183
 ] 

Bolke de Bruin edited comment on SPARK-9019 at 7/14/15 10:41 AM:
-----------------------------------------------------------------

[~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. 
Trace remains the same.

With the patch a user without a key tab cannot use spark-submit anymore with 
--master yarn-cluster (failed renewal of token)


was (Author: bolke):
[~srowen] unfortunately the patch from SPARK-8851 did not solve the issue. 
Trace remains the same.

> spark-submit fails on yarn with kerberos enabled
> ------------------------------------------------
>
>                 Key: SPARK-9019
>                 URL: https://issues.apache.org/jira/browse/SPARK-9019
>             Project: Spark
>          Issue Type: Bug
>          Components: Spark Submit
>    Affects Versions: 1.5.0
>         Environment: Hadoop 2.6 with YARN and kerberos enabled
>            Reporter: Bolke de Bruin
>              Labels: kerberos, spark-submit, yarn
>
> It is not possible to run jobs using spark-submit on yarn with a kerberized 
> cluster. 
> Commandline:
> /usr/hdp/2.2.0.0-2041/spark-1.5.0/bin/spark-submit --principal sparkjob 
> --keytab sparkjob.keytab --num-executors 3 --executor-cores 5 
> --executor-memory 5G --master yarn-cluster /tmp/get_peers.py 
> Fails with:
> 15/07/13 22:48:31 INFO server.Server: jetty-8.y.z-SNAPSHOT
> 15/07/13 22:48:31 INFO server.AbstractConnector: Started 
> SelectChannelConnector@0.0.0.0:58380
> 15/07/13 22:48:31 INFO util.Utils: Successfully started service 'SparkUI' on 
> port 58380.
> 15/07/13 22:48:31 INFO ui.SparkUI: Started SparkUI at 
> http://10.111.114.9:58380
> 15/07/13 22:48:31 INFO cluster.YarnClusterScheduler: Created 
> YarnClusterScheduler
> 15/07/13 22:48:31 WARN metrics.MetricsSystem: Using default name DAGScheduler 
> for source because spark.app.id is not set.
> 15/07/13 22:48:32 INFO util.Utils: Successfully started service 
> 'org.apache.spark.network.netty.NettyBlockTransferService' on port 43470.
> 15/07/13 22:48:32 INFO netty.NettyBlockTransferService: Server created on 
> 43470
> 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Trying to register 
> BlockManager
> 15/07/13 22:48:32 INFO storage.BlockManagerMasterEndpoint: Registering block 
> manager 10.111.114.9:43470 with 265.1 MB RAM, BlockManagerId(driver, 
> 10.111.114.9, 43470)
> 15/07/13 22:48:32 INFO storage.BlockManagerMaster: Registered BlockManager
> 15/07/13 22:48:32 INFO impl.TimelineClientImpl: Timeline service address: 
> http://lxhnl002.ad.ing.net:8188/ws/v1/timeline/
> 15/07/13 22:48:33 WARN ipc.Client: Exception encountered while connecting to 
> the server : org.apache.hadoop.security.AccessControlException: Client cannot 
> authenticate via:[TOKEN, KERBEROS]
> 15/07/13 22:48:33 INFO client.ConfiguredRMFailoverProxyProvider: Failing over 
> to rm2
> 15/07/13 22:48:33 INFO retry.RetryInvocationHandler: Exception while invoking 
> getClusterNodes of class ApplicationClientProtocolPBClientImpl over rm2 after 
> 1 fail over attempts. Trying to fail over after sleeping for 32582ms.
> java.net.ConnectException: Call From lxhnl006.ad.ing.net/10.111.114.9 to 
> lxhnl013.ad.ing.net:8032 failed on connection exception: 
> java.net.ConnectException: Connection refused; For more details see:  
> http://wiki.apache.org/hadoop/ConnectionRefused
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:791)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:731)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1472)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1399)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:232)
>       at com.sun.proxy.$Proxy24.getClusterNodes(Unknown Source)
>       at 
> org.apache.hadoop.yarn.api.impl.pb.client.ApplicationClientProtocolPBClientImpl.getClusterNodes(ApplicationClientProtocolPBClientImpl.java:262)
>       at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>       at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>       at java.lang.reflect.Method.invoke(Method.java:606)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:187)
>       at 
> org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
>       at com.sun.proxy.$Proxy25.getClusterNodes(Unknown Source)
>       at 
> org.apache.hadoop.yarn.client.api.impl.YarnClientImpl.getNodeReports(YarnClientImpl.java:475)
>       at 
> org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:92)
>       at 
> org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend$$anonfun$getDriverLogUrls$1.apply(YarnClusterSchedulerBackend.scala:73)
> at scala.Option.foreach(Option.scala:236)
>       at 
> org.apache.spark.scheduler.cluster.YarnClusterSchedulerBackend.getDriverLogUrls(YarnClusterSchedulerBackend.scala:73)
>       at 
> org.apache.spark.SparkContext.postApplicationStart(SparkContext.scala:1993)
>       at org.apache.spark.SparkContext.<init>(SparkContext.scala:544)
>       at 
> org.apache.spark.api.java.JavaSparkContext.<init>(JavaSparkContext.scala:61)
>       at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>       at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:234)
>       at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>       at py4j.Gateway.invoke(Gateway.java:214)
>       at 
> py4j.commands.ConstructorCommand.invokeConstructor(ConstructorCommand.java:79)
>       at py4j.commands.ConstructorCommand.execute(ConstructorCommand.java:68)
>       at py4j.GatewayConnection.run(GatewayConnection.java:207)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:530)
>       at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:494)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:607)
>       at 
> org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:705)
>       at org.apache.hadoop.ipc.Client$Connection.access$2800(Client.java:368)
>       at org.apache.hadoop.ipc.Client.getConnection(Client.java:1521)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1438)
>       ... 30 more
> If not using --principal and --keytab the same error shows.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to