[
https://issues.apache.org/jira/browse/SPARK-16711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Apache Spark reassigned SPARK-16711:
------------------------------------
Assignee: (was: Apache Spark)
> YarnShuffleService doesn't re-init properly on YARN rolling upgrade
> -------------------------------------------------------------------
>
> Key: SPARK-16711
> URL: https://issues.apache.org/jira/browse/SPARK-16711
> Project: Spark
> Issue Type: Bug
> Components: Shuffle, YARN
> Affects Versions: 1.5.2
> Reporter: Thomas Graves
>
> When a yarn rolling upgrade happens the Spark YarnShuffleService isn't
> re-initializing the tokens soon enough which causes running applications to
> fail with NullPointerExceptions rather then IOExceptions which causes clients
> to not retry which in turn causes the application to totally fail when it
> should have just retried and succeeded.
> 2016-07-22 23:22:05,460 [shuffle-server-1] ERROR
> server.TransportRequestHandler: Error while invoking RpcHandler#receive() on
> RPC id 6235606084052282795
> java.lang.NullPointerException: Password cannot be null if SASL is enabled
> at
> org.spark-project.guava.base.Preconditions.checkNotNull(Preconditions.java:208)
> at
> org.apache.spark.network.sasl.SparkSaslServer.encodePassword(SparkSaslServer.java:196)
> at
> org.apache.spark.network.sasl.SparkSaslServer$DigestCallbackHandler.handle(SparkSaslServer.java:166)
> at
> com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:589)
> at
> com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:244)
> at
> org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:119)
> at
> org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:101)
> at
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:149)
> at
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:102)
> at
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
> at
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
> at
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> at
> io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:254)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> at
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> at
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:86)
> at
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
> at
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
> at
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
> at
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
> at
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
> at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
> at
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:116)
> at java.lang.Thread.run(Thread.java:745)
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]