Zhongwei Zhu created SPARK-44126:
------------------------------------

             Summary: Migration shuffle to decommissioned executor should not 
count as block failure
                 Key: SPARK-44126
                 URL: https://issues.apache.org/jira/browse/SPARK-44126
             Project: Spark
          Issue Type: Bug
          Components: Spark Core
    Affects Versions: 3.4.0
            Reporter: Zhongwei Zhu


When shuffle migration to decommissioned executor, the below exception is 
thrown:

```

org.apache.spark.SparkException: Exception thrown in awaitResult:     at 
org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:301)    at 
org.apache.spark.network.BlockTransferService.uploadBlockSync(BlockTransferService.scala:122)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$5(BlockManagerDecommissioner.scala:127)
    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.$anonfun$run$5$adapted(BlockManagerDecommissioner.scala:118)
    at scala.collection.immutable.List.foreach(List.scala:431)    at 
org.apache.spark.storage.BlockManagerDecommissioner$ShuffleMigrationRunnable.run(BlockManagerDecommissioner.scala:118)
    at 
java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)    at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.base/java.lang.Thread.run(Thread.java:829)Caused by: 
java.lang.RuntimeException: 
org.apache.spark.storage.BlockSavedOnDecommissionedBlockManagerException: Block 
shuffle_2_6429_0.data cannot be saved on decommissioned executor    at 
org.apache.spark.errors.SparkCoreErrors$.cannotSaveBlockOnDecommissionedExecutorError(SparkCoreErrors.scala:238)
    at 
org.apache.spark.storage.BlockManager.checkShouldStore(BlockManager.scala:277)  
  at 
org.apache.spark.storage.BlockManager.putBlockDataAsStream(BlockManager.scala:741)
    at 
org.apache.spark.network.netty.NettyBlockRpcServer.receiveStream(NettyBlockRpcServer.scala:174)
    at 
org.apache.spark.network.server.AbstractAuthRpcHandler.receiveStream(AbstractAuthRpcHandler.java:78)
    at 
org.apache.spark.network.server.TransportRequestHandler.processStreamUpload(TransportRequestHandler.java:202)
    at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:115)
    at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:140)
    at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:53)
    at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:99)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at 
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:102)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at 
org.apache.spark.network.crypto.TransportCipher$DecryptionHandler.channelRead(TransportCipher.java:190)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412)
    at 
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440)
    at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420)
    at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919)
    at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166)
    at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788)    
at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724)
    at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650)    
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562)    at 
io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997)
    at 
io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74)    at 
io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
    at java.base/java.lang.Thread.run(Thread.java:829)

```

Then this count as block migration failure, which should not.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to