[
https://issues.apache.org/jira/browse/HBASE-22492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16869769#comment-16869769
]
Josh Elser commented on HBASE-22492:
------------------------------------
I'm looking at this some more, and I actually think the problem is more that we
may mis-order responses in {{doRespond()}}.
Let's take responses: R1 and R2, which we should send back in that order (1,
then 2).
If we fail to write R1 (client socket not ready to read the entire response),
we'll re-queue that Call back to the response queue, but at the _front_ of the
queue. We only add elements to the back of the queue when the queue is empty
and there is no other write happening ({{responseWriteLock}}).
However, we don't hold the responseWriteLock when queueing the element to the
back of the queue. Thus, if both R1 and R2 are trying to be queued at the same
time and the queue is not empty, we have a race condition which could allow for
R2 to be queue before R1.
I don't think this invalidates Sebastien's fix (the same one that Hadoop made),
just wanted to share as it makes sense to me now how we might mis-queue these
elements.
> HBase server doesn't preserve SASL sequence number on the network
> -----------------------------------------------------------------
>
> Key: HBASE-22492
> URL: https://issues.apache.org/jira/browse/HBASE-22492
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Affects Versions: 1.1.2
> Environment: HDP 2.6.5.108-1
>
> Reporter: Sébastien BARNOUD
> Priority: Major
> Attachments: HBASE-22492.001.branch-1.patch,
> HBASE-22492.002.branch-1.patch
>
>
> When auth-conf is enabled on RPC, the server encrypt response in setReponse()
> using saslServer. The generated cryptogram included a sequence number manage
> by saslServer. But then, when the response is sent over the network, the
> sequence number order is not preserved.
> The client receives reply in the wrong order, leading to a log message from
> DigestMD5Base:
> {code:java}
> sasl:1481 - DIGEST41:Unmatched MACs
> {code}
> Then the message is discarded, leading the client to a timeout.
> I propose a fix here:
> [https://github.com/sbarnoud/hbase-release/commit/ce9894ffe0e4039deecd1ed51fa135f64b311d41]
> It seems that any HBase 1.x is affected.
> This part of code has been fully rewritten in HBase 2.x, and i haven't do the
> analysis on HBase 2.x which may be affected.
>
> Here, an extract of client log that i added to help me to understand:
> {code:java}
> …
> 2019-05-28 12:53:48,644 DEBUG [Default-IPC-NioEventLoopGroup-1-32]
> NettyRpcDuplexHandler:80 - callId: 5846 /192.163.201.65:58870 ->
> dtltstap004.fr.world.socgen/192.163.201.72:16020
> 2019-05-28 12:53:48,651 INFO [Default-IPC-NioEventLoopGroup-1-18]
> NioEventLoop:101 - SG: Channel ready to read 1315913615 unsafe 1493023957
> /192.163.201.65:44236 -> dtltstap008.fr.world.socgen/192.163.201.109:16020
> 2019-05-28 12:53:48,651 INFO [Default-IPC-NioEventLoopGroup-1-18]
> SaslUnwrapHandler:78 - SG: after unwrap:46 -> 29 for /192.163.201.65:44236
> -> dtltstap008.fr.world.socgen/192.163.201.109:16020 seqNum 150
> 2019-05-28 12:53:48,652 DEBUG [Default-IPC-NioEventLoopGroup-1-18]
> NettyRpcDuplexHandler:192 - callId: 5801 received totalSize:25 Message:20
> scannerSize:(null)/192.163.201.65:44236 ->
> dtltstap008.fr.world.socgen/192.163.201.109:16020
> 2019-05-28 12:53:48,652 INFO [Default-IPC-NioEventLoopGroup-1-18] sasl:1481
> - DIGEST41:Unmatched MACs
> 2019-05-28 12:53:48,652 WARN [Default-IPC-NioEventLoopGroup-1-18]
> SaslUnwrapHandler:70 - Sasl error (probably invalid MAC) detected for
> /192.163.201.65:44236 -> dtltstap008.fr.world.socgen/192.163.201.109:16020
> saslClient @4ac31121 ctx @14fb001d msg @140313192718406 len 118
> data:1c^G?^P?3??h?k??????"??x?$^_??^D;^]7^Es??Em?c?w^R^BL?????????x??omG?z?I???45}???dE?^\^S>D?^????/4f?^^??
> ?^E????d?????????D?kM^@^A^@^@^@? readerIndex 118 writerIndex 118 seqNum
> 152{code}
> We can see that the client unwraps the Sasl message with sequence number 152
> before sequence number 151 and fails with the unmatched MAC.
>
> I opened a case to Oracle because we should had an error (and not the message
> ignored). That's because the JDK doesn't controls integrity in the right way.
> [https://github.com/openjdk/jdk/blob/master/src/java.security.sasl/share/classes/com/sun/security/sasl/digest/DigestMD5Base.java]
> The actual JDK controls the HMac before the sequence number and hides the
> real error (bad sequence number) because SASL is stateful. The JDK should
> check FIRST the sequence number and THEN the HMac.
> When (and if) the JDK will be patched, and accordingly to
> [https://www.ietf.org/rfc/rfc2831.txt|https://www.ietf.org/rfc/rfc2831.txt,]
> , we will get an exception in that case instead of having the message ignored.
> h3.
>
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)