[ https://issues.apache.org/jira/browse/SPARK-12007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168926#comment-15168926 ]
xukun commented on SPARK-12007: ------------------------------- [~vanzin] [~zsxwing] When I test word count on a larger cluster(100 workers),after apply this patch, these exception occurs every time, but on a small cluster never. 16/02/26 10:35:04 WARN TaskSetManager: Lost task 213.0 in stage 1.0 (TID 16971, SZV1000041645): FetchFailed(BlockManagerId(118, 192.168.75.193, 23325), shuffleId=0, mapId=51, reduceId=213, message= org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched nonce. at org.spark-project.guava.base.Throwables.propagate(Throwables.java:160) at org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:121) at org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:100) at org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:128) at org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:99) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104) at org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51) at io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846) at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131) at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511) at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354) at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111) at java.lang.Thread.run(Thread.java:745) Do you have any idea about this? Thanks! > Network library's RPC layer requires a lot of copying > ----------------------------------------------------- > > Key: SPARK-12007 > URL: https://issues.apache.org/jira/browse/SPARK-12007 > Project: Spark > Issue Type: Improvement > Components: Spark Core > Affects Versions: 1.6.0 > Reporter: Marcelo Vanzin > Assignee: Marcelo Vanzin > Fix For: 1.6.0 > > > The network library's RPC layer has an external API based on byte arrays, > instead of ByteBuffer; that requires a lot of copying since the internals of > the library use ByteBuffers (or rather Netty's ByteBuf), and lots of external > clients also use ByteBuffer. > The extra copies could be avoided if the API used ByteBuffer instead. > To show an extreme case, look at an RPC send via NettyRpcEnv: > - message is encoded using JavaSerializer, resulting in a ByteBuffer > - the ByteBuffer is copied into a byte array of the right size, since its > internal array may be larger than the actual data it holds > - the network library's encoder copies the byte array into a ByteBuf > - finally the data is written to the socket > The intermediate 2 copies could be avoided if the API allowed the original > ByteBuffer to be sent instead. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org