[
https://issues.apache.org/jira/browse/SPARK-24346?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16782678#comment-16782678
]
Mohamed Mehdi BEN AISSA commented on SPARK-24346:
-------------------------------------------------
Any news !? I have exactly the same issue in the same context (HDP version) :
ERROR TransportRequestHandler: Error opening block
StreamChunkId\{streamId=1377556883266, chunkIndex=9} for request from
/10.147.167.40:39050
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readLong(DataInputStream.java:416)
at
org.apache.spark.shuffle.IndexShuffleBlockResolver.getBlockData(IndexShuffleBlockResolver.scala:209)
at
org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:375)
at
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:61)
at
org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$1.apply(NettyBlockRpcServer.scala:60)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at
scala.collection.convert.Wrappers$IteratorWrapper.next(Wrappers.scala:31)
at
org.apache.spark.network.server.OneForOneStreamManager.getChunk(OneForOneStreamManager.java:92)
at
org.apache.spark.network.server.TransportRequestHandler.processFetchRequest(TransportRequestHandler.java:137)
at
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:109)
at
org.apache.spark.network.server.TransportChannelHandler.channelRead(TransportChannelHandler.java:118)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:286)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:102)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at
org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:85)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:340)
at
io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1359)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:362)
at
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:348)
at
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:935)
at
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:138)
at
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:645)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:580)
at
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:497)
at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:459)
at
io.netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:858)
at
io.netty.util.concurrent.DefaultThreadFactory$DefaultRunnableDecorator.run(DefaultThreadFactory.java:138)
at java.lang.Thread.run(Thread.java:745)
> Executors are unable to fetch remote cache blocks
> -------------------------------------------------
>
> Key: SPARK-24346
> URL: https://issues.apache.org/jira/browse/SPARK-24346
> Project: Spark
> Issue Type: Bug
> Components: Shuffle, Spark Core
> Affects Versions: 2.3.0
> Environment: OS: Centos 7.3
> Cluster: Hortonwork HDP 2.6.5 with Spark 2.3.0
> Reporter: Truong Duc Kien
> Priority: Major
>
> After we upgrade from Spark 2.2.1 to Spark 2.3.0, our Spark jobs took a
> massive performance hit because executors become unable to fetch remote cache
> block from each others. The scenario is:
> 1. An executor creates a connection and sends a ChunkFetchRequest message to
> another executor.
> 2. This request arrives at the target executor, which sends back a
> ChunkFetchSuccess response
> 3. The ChunkFetchSuccess msg never arrives.
> 4. The connection between these two executors is killed by the originating
> executor after 120s of idleness. At the same time, the other executor report
> that it failed to send the ChunkFetchSuccess because the pipe is closed.
> This process repeats itself 3 times, delaying our jobs by 6 minutes, then the
> originating executor decides to stop fetching and calculates the block by
> itself and the job can continue.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]