[ 
https://issues.apache.org/jira/browse/SPARK-21827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16849757#comment-16849757
 ] 

Sébastien BARNOUD commented on SPARK-21827:
-------------------------------------------

Hi,

For HBase i found the exact reason for this issue:

With Sasl we have a sequence number for each Sasl message. Message must be sent 
over the network in the sequence order.If not (which is currently the case for 
at least HBase) we get this log message from DigestMD5Base (a jdk class):

"DIGEST41:Unmatched MACs"

and the message will be ignored. In fact we should have a Sasl Exception 
([https://github.com/openjdk/jdk/blob/master/src/java.security.sasl/share/classes/com/sun/security/sasl/digest/DigestMD5Base.java):
{code}

            if (peerSeqNum != networkByteOrderToInt(seqNum, 0, 4)) {
                throw new SaslException("DIGEST-MD5: Out of order " +
                    "sequencing of messages from server. Got: " +
                    networkByteOrderToInt(seqNum, 0, 4) + " Expected: " +
                    peerSeqNum);
            }

{code}

But we have the DIGEST41 instead because the sequence number is used for the 
MAC computation, and fails before the sequence number control.

So, to summarize:
-) A jdk issue that hides the real issue (MAC unmatch instead of bad sequence 
number)
-) A bug at least in HBase and probably in some other Hadoop component that 
makes Sasl message not sent in the same order than the Sasl sequence number

Regards,


> Task fail due to executor exception when enable Sasl Encryption
> ---------------------------------------------------------------
>
>                 Key: SPARK-21827
>                 URL: https://issues.apache.org/jira/browse/SPARK-21827
>             Project: Spark
>          Issue Type: Bug
>          Components: Shuffle, Spark Core
>    Affects Versions: 1.6.1, 2.1.1, 2.2.0
>         Environment: OS: RedHat 7.1 64bit
>            Reporter: Yishan Jiang
>            Priority: Major
>
> We met authentication and Sasl encryption on many versions, just append 161 
> version like this:
> spark.local.dir /tmp/test-161
> spark.shuffle.service.enabled true
> *spark.authenticate true*
> *spark.authenticate.enableSaslEncryption true*
> *spark.network.sasl.serverAlwaysEncrypt true*
> spark.authenticate.secret e25d4369-bec3-4266-8fc5-fb6d4fcee66f
> spark.history.ui.port 18089
> spark.shuffle.service.port 7347
> spark.master.rest.port 6076
> spark.deploy.recoveryMode NONE
> spark.ssl.enabled true
> spark.executor.extraJavaOptions -Djava.security.egd=file:/dev/./urandom
> We run an Spark example and task fail with Exception messages:
> 17/08/22 03:56:52 INFO BlockManager: external shuffle service port = 7347
> 17/08/22 03:56:52 INFO BlockManagerMaster: Trying to register BlockManager
> 17/08/22 03:56:52 INFO sasl: DIGEST41:Unmatched MACs
> 17/08/22 03:56:52 WARN TransportChannelHandler: Exception in connection from 
> cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -5594407078713290673       
>         at 
> org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>         at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
>         at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Thread.java:785)
> 17/08/22 03:56:52 ERROR TransportResponseHandler: Still have 1 requests 
> outstanding when connection from 
> cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394 is closed
> 17/08/22 03:56:52 WARN NettyRpcEndpointRef: Error sending message [message = 
> RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, 
> cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEn
> dpointRef(null))] in 1 attempts
> java.lang.IllegalArgumentException: Frame length should be positive: 
> -5594407078713290673
>         at 
> org.spark-project.guava.base.Preconditions.checkArgument(Preconditions.java:119)
>         at 
> org.apache.spark.network.util.TransportFrameDecoder.decodeNext(TransportFrameDecoder.java:135)
>         at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:82)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:308)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:294)
>         at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:846)
>         at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:131)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>         at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Thread.java:785)
> 17/08/22 03:56:55 ERROR TransportClient: Failed to send RPC 
> 9091046580632843491 to cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394: 
> java.nio.channels.ClosedChannelException
> java.nio.channels.ClosedChannelException
> 17/08/22 03:56:55 WARN NettyRpcEndpointRef: Error sending message [message = 
> RegisterBlockManager(BlockManagerId(fe9d31da-f70c-40a2-9032-05a5af4ba4c5, 
> cws58n1.ma.platformlab.ibm.com, 45852),2985295872,NettyRpcEndpointRef(null))] 
> in 2 attempts
> java.io.IOException: Failed to send RPC 9091046580632843491 to 
> cws57n6.ma.platformlab.ibm.com/172.29.8.66:49394: 
> java.nio.channels.ClosedChannelException
>         at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:239)
>         at 
> org.apache.spark.network.client.TransportClient$3.operationComplete(TransportClient.java:226)
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:680)
>         at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:567)
>         at 
> io.netty.util.concurrent.DefaultPromise.tryFailure(DefaultPromise.java:424)
>         at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetFailure(AbstractChannel.java:801)
>         at 
> io.netty.channel.AbstractChannel$AbstractUnsafe.write(AbstractChannel.java:699)
>         at 
> io.netty.channel.DefaultChannelPipeline$HeadContext.write(DefaultChannelPipeline.java:1122)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.invokeWrite(AbstractChannelHandlerContext.java:633)
>         at 
> io.netty.channel.AbstractChannelHandlerContext.access$1900(AbstractChannelHandlerContext.java:32)
>         at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.write(AbstractChannelHandlerContext.java:908)
>         at 
> io.netty.channel.AbstractChannelHandlerContext$WriteAndFlushTask.write(AbstractChannelHandlerContext.java:960)
>         at 
> io.netty.channel.AbstractChannelHandlerContext$AbstractWriteTask.run(AbstractChannelHandlerContext.java:893)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357)
>         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)
>         at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:111)
>         at java.lang.Thread.run(Thread.java:785)
> Caused by: java.nio.channels.ClosedChannelException



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to