[
https://issues.apache.org/jira/browse/HBASE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052522#comment-17052522
]
Josh Elser commented on HBASE-23881:
------------------------------------
So, I'm pretty convinced of the problem in master. I'm currently trying to
understand if branch-2 and master are failing in the same manner. I _think_
that they're just different on the surface because branch-2 still defaults to
using NIO.
The curious bit is that we can see the same semantics – the sasl client thinks
that the handshake is done, but the client doesn't barge forward as we see in
master:
{noformat}
2020-03-05 15:36:54,484 WARN [RS-EventLoopGroup-1-8]
ipc.ServerRpcConnection(377): Auth failed for 192.168.2.28:60616: Unknown
2020-03-05 15:36:54,484 TRACE [Default-IPC-NioEventLoopGroup-4-10]
ipc.NettyRpcDuplexHandler(131): got response header , totalSize: 1 bytes
2020-03-05 15:36:54,484 TRACE [RS-EventLoopGroup-1-8]
ipc.NettyRpcServerRequestDecoder(76): Connection /192.168.2.28:60616; caught
unexpected downstream exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: Authentication
failed for user1
at
org.apache.hadoop.hbase.security.provider.example.ShadeSaslServerAuthenticationProvider$ShadeSaslServerCallbackHandler.handle(ShadeSaslServerAuthenticationProvider.java:171)
at
org.apache.hadoop.hbase.security.provider.example.SaslPlainServer.evaluateResponse(SaslPlainServer.java:108)
at
org.apache.hadoop.hbase.security.HBaseSaslRpcServer.evaluateResponse(HBaseSaslRpcServer.java:65)
at
org.apache.hadoop.hbase.ipc.ServerRpcConnection.saslReadAndProcess(ServerRpcConnection.java:359)
at
org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:87)
at
org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:73)
at
org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:68)
at
org.apache.hadoop.hbase.ipc.NettyRpcServerRequestDecoder.channelRead(NettyRpcServerRequestDecoder.java:62)
...
2020-03-05 15:36:54,485 TRACE [RS-EventLoopGroup-1-8]
ipc.NettyRpcServerRequestDecoder(68): Disconnection /192.168.2.28:60616; #
active connections=2
2020-03-05 15:36:54,486 INFO [Default-IPC-NioEventLoopGroup-4-10]
ipc.NettyRpcDuplexHandler(220): exceptionCaught
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException:
Message missing required fields: is_master_running, {}
2020-03-05 15:36:54,488 INFO [Default-IPC-NioEventLoopGroup-4-10]
ipc.NettyRpcDuplexHandler(210): channelInactive
ChannelHandlerContext(NettyRpcDuplexHandler#0, [id: 0x7e79684c,
L:/192.168.2.28:60616 ! R:mizar.local/192.168.2.28:60555]), {}
{noformat}
I'm still digging, but I think it has to do with the semantics of
ConnectionImplementation in 2.x versus ConnectionOverAsyncConnection in master.
ConnectionImplementation would make a call to {{isMasterRunning}} when creating
the Master RPC stub. This ultimately triggers the above error, trying to parse
the response, erroring out, and retrying the RPC. That is, it's circumstantial
that we got an _unrelated to authentication error_ which caused branch 2.x to
execute the retry logic.
> TestShadeSaslAuthenticationProvider failures
> --------------------------------------------
>
> Key: HBASE-23881
> URL: https://issues.apache.org/jira/browse/HBASE-23881
> Project: HBase
> Issue Type: Bug
> Components: test
> Affects Versions: 3.0.0, 2.3.0
> Reporter: Bharath Vissapragada
> Assignee: Josh Elser
> Priority: Major
>
> TestShadeSaslAuthenticationProvider now fails deterministically with the
> following exception..
> {noformat}
> java.lang.Exception: Unexpected exception,
> expected<org.apache.hadoop.hbase.DoNotRetryIOException> but
> was<java.io.IOException>
> at
> org.apache.hadoop.hbase.security.provider.example.TestShadeSaslAuthenticationProvider.testNegativeAuthentication(TestShadeSaslAuthenticationProvider.java:233)
> {noformat}
> The test now fails a different place than before merging HBASE-18095 because
> the RPCs are also a part of connection setup. We might need to rewrite the
> test..
--
This message was sent by Atlassian Jira
(v8.3.4#803005)