[ 
https://issues.apache.org/jira/browse/SPARK-12007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15168926#comment-15168926
 ] 

xukun commented on SPARK-12007:
-------------------------------

[~vanzin] [~zsxwing]

When I test word count on a larger cluster(100 workers),after apply this patch, 
these exception occurs every time, but on a small cluster never.


16/02/26 10:35:04 WARN TaskSetManager: Lost task 213.0 in stage 1.0 (TID 16971, 
SZV1000041645): FetchFailed(BlockManagerId(118, 192.168.75.193, 23325), 
shuffleId=0, mapId=51, reduceId=213, message=
org.apache.spark.shuffle.FetchFailedException: java.lang.RuntimeException: 
javax.security.sasl.SaslException: DIGEST-MD5: digest response format 
violation. Mismatched nonce.
        at 
org.spark-project.guava.base.Throwables.propagate(Throwables.java:160)
        at 
org.apache.spark.network.sasl.SparkSaslServer.response(SparkSaslServer.java:121)
        at 
org.apache.spark.network.sasl.SaslRpcHandler.receive(SaslRpcHandler.java:100)
        at 
org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:128)
        at 
org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:99)
        at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:104)
        at 
org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:51)
        at 
io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at 
io.netty.handler.timeout.IdleStateHandler.channelRead(IdleStateHandler.java:266)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at 
io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at 
io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:244)
        at 
io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
        at 
io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
        at 
io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
        at 
io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
        at 
io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
        at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
        at 
io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
        at java.lang.Thread.run(Thread.java:745)

Do you have any idea about this?

Thanks!


> Network library's RPC layer requires a lot of copying
> -----------------------------------------------------
>
>                 Key: SPARK-12007
>                 URL: https://issues.apache.org/jira/browse/SPARK-12007
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 1.6.0
>            Reporter: Marcelo Vanzin
>            Assignee: Marcelo Vanzin
>             Fix For: 1.6.0
>
>
> The network library's RPC layer has an external API based on byte arrays, 
> instead of ByteBuffer; that requires a lot of copying since the internals of 
> the library use ByteBuffers (or rather Netty's ByteBuf), and lots of external 
> clients also use ByteBuffer.
> The extra copies could be avoided if the API used ByteBuffer instead.
> To show an extreme case, look at an RPC send via NettyRpcEnv:
> - message is encoded using JavaSerializer, resulting in a ByteBuffer
> - the ByteBuffer is copied into a byte array of the right size, since its 
> internal array may be larger than the actual data it holds
> - the network library's encoder copies the byte array into a ByteBuf
> - finally the data is written to the socket
> The intermediate 2 copies could be avoided if the API allowed the original 
> ByteBuffer to be sent instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to