[ 
https://issues.apache.org/jira/browse/HBASE-23881?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052522#comment-17052522
 ] 

Josh Elser commented on HBASE-23881:
------------------------------------

So, I'm pretty convinced of the problem in master. I'm currently trying to 
understand if branch-2 and master are failing in the same manner. I _think_ 
that they're just different on the surface because branch-2 still defaults to 
using NIO.

The curious bit is that we can see the same semantics – the sasl client thinks 
that the handshake is done, but the client doesn't barge forward as we see in 
master:
{noformat}
2020-03-05 15:36:54,484 WARN  [RS-EventLoopGroup-1-8] 
ipc.ServerRpcConnection(377): Auth failed for  192.168.2.28:60616: Unknown
2020-03-05 15:36:54,484 TRACE [Default-IPC-NioEventLoopGroup-4-10] 
ipc.NettyRpcDuplexHandler(131): got response header , totalSize: 1 bytes
2020-03-05 15:36:54,484 TRACE [RS-EventLoopGroup-1-8] 
ipc.NettyRpcServerRequestDecoder(76): Connection /192.168.2.28:60616; caught 
unexpected downstream exception.
org.apache.hadoop.security.token.SecretManager$InvalidToken: Authentication 
failed for user1
        at 
org.apache.hadoop.hbase.security.provider.example.ShadeSaslServerAuthenticationProvider$ShadeSaslServerCallbackHandler.handle(ShadeSaslServerAuthenticationProvider.java:171)
        at 
org.apache.hadoop.hbase.security.provider.example.SaslPlainServer.evaluateResponse(SaslPlainServer.java:108)
        at 
org.apache.hadoop.hbase.security.HBaseSaslRpcServer.evaluateResponse(HBaseSaslRpcServer.java:65)
        at 
org.apache.hadoop.hbase.ipc.ServerRpcConnection.saslReadAndProcess(ServerRpcConnection.java:359)
        at 
org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:87)
        at 
org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:73)
        at 
org.apache.hadoop.hbase.ipc.NettyServerRpcConnection.process(NettyServerRpcConnection.java:68)
        at 
org.apache.hadoop.hbase.ipc.NettyRpcServerRequestDecoder.channelRead(NettyRpcServerRequestDecoder.java:62)
...
2020-03-05 15:36:54,485 TRACE [RS-EventLoopGroup-1-8] 
ipc.NettyRpcServerRequestDecoder(68): Disconnection /192.168.2.28:60616; # 
active connections=2
2020-03-05 15:36:54,486 INFO  [Default-IPC-NioEventLoopGroup-4-10] 
ipc.NettyRpcDuplexHandler(220): exceptionCaught 
org.apache.hbase.thirdparty.com.google.protobuf.UninitializedMessageException: 
Message missing required fields: is_master_running, {}
2020-03-05 15:36:54,488 INFO  [Default-IPC-NioEventLoopGroup-4-10] 
ipc.NettyRpcDuplexHandler(210): channelInactive 
ChannelHandlerContext(NettyRpcDuplexHandler#0, [id: 0x7e79684c, 
L:/192.168.2.28:60616 ! R:mizar.local/192.168.2.28:60555]), {}
{noformat}
I'm still digging, but I think it has to do with the semantics of 
ConnectionImplementation in 2.x versus ConnectionOverAsyncConnection in master. 
ConnectionImplementation would make a call to {{isMasterRunning}} when creating 
the Master RPC stub. This ultimately triggers the above error, trying to parse 
the response, erroring out, and retrying the RPC. That is, it's circumstantial 
that we got an _unrelated to authentication error_ which caused branch 2.x to 
execute the retry logic.

> TestShadeSaslAuthenticationProvider failures
> --------------------------------------------
>
>                 Key: HBASE-23881
>                 URL: https://issues.apache.org/jira/browse/HBASE-23881
>             Project: HBase
>          Issue Type: Bug
>          Components: test
>    Affects Versions: 3.0.0, 2.3.0
>            Reporter: Bharath Vissapragada
>            Assignee: Josh Elser
>            Priority: Major
>
> TestShadeSaslAuthenticationProvider now fails deterministically with the 
> following exception..
> {noformat}
> java.lang.Exception: Unexpected exception, 
> expected<org.apache.hadoop.hbase.DoNotRetryIOException> but 
> was<java.io.IOException>
>       at 
> org.apache.hadoop.hbase.security.provider.example.TestShadeSaslAuthenticationProvider.testNegativeAuthentication(TestShadeSaslAuthenticationProvider.java:233)
> {noformat}
> The test now fails a different place than before merging HBASE-18095 because 
> the RPCs are also a part of connection setup. We might need to rewrite the 
> test..  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to