[
https://issues.apache.org/jira/browse/HBASE-25875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342111#comment-17342111
]
Pankaj Kumar commented on HBASE-25875:
--------------------------------------
This is a race condition problem, where LeaderElector is started (via
AuthenticationTokenSecretManager#retrievePassword) before the
AuthenticationTokenSecretManager start it . Calling start on this live thread
via AuthenticationTokenSecretManager#start throws IllegalThreadStateException.
*Analysis:*
1.During RSRpcServices#createRpcServer, we create NettyRpcServer instance
(NettyRpcServerPreambleHandler, NettyRpcServerRequestDecoder etc are
initialized)
2. In NettyRpcServer#start, we are intializing AuthenticationTokenSecretManager
and setting it as a secret manager and then start it.
3. In AuthenticationTokenSecretManager#start
- We are starting ZKSecretWatcher which internally set watcher and refresh
the nodes.
- Start the LeaderElector
Since we are setting AuthenticationTokenSecretManager as secret manager after
step-2, so it will be avalailble for NettyServerRpcConnection processing. So
HBaseSaslRpcServer will have the AuthenticationTokenSecretManager and go for
HBaseSaslRpcServer#evaluateResponse which internally calls
AuthenticationTokenSecretManager#retrievePassword. And while retrieving the
password it start the LeaderElector if this thread is not alive (relevant logs
are observed).
> RegionServer failed to start due to IllegalThreadStateException in
> AuthenticationTokenSecretManager.start
> ---------------------------------------------------------------------------------------------------------
>
> Key: HBASE-25875
> URL: https://issues.apache.org/jira/browse/HBASE-25875
> Project: HBase
> Issue Type: Bug
> Reporter: Pankaj Kumar
> Assignee: Pankaj Kumar
> Priority: Major
>
> RegionServer failed to complete initialization and aborted during
> AuthenticationTokenSecretManager#leaderElector start.
> Observed following WARN log,
> {noformat}
> 2021-05-03 07:59:01,848 | WARN | RS-EventLoopGroup-1-6 | Thread
> leaderElector[ZKSecretWatcher-leaderElector:56] is stopped or not alive |
> org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager.retrievePassword(AuthenticationTokenSecretManager.java:153)
> 2021-05-03 07:59:01,848 | INFO | RS-EventLoopGroup-1-6 | Thread
> leaderElector [ZKSecretWatcher-leaderElector:56] is started |
> org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager.retrievePassword(AuthenticationTokenSecretManager.java:156)
> 2021-05-03 07:59:01,854 | INFO | ZKSecretWatcher-leaderElector | Found
> existing leader with ID: RS-IP-PORT-StartCode |
> org.apache.hadoop.hbase.zookeeper.ZKLeaderManager.waitToBecomeLeader(ZKLeaderManager.java:130)
> {noformat}
> As per the code, AuthenticationTokenSecretManager#leaderElector is started
> while retrieving password before AuthenticationTokenSecretManager#start,
>
> [https://github.com/apache/hbase/blob/8c2332d46532135723cc7a6084a2a125f3d9d8db/hbase-server/src/main/java/org/apache/hadoop/hbase/security/token/AuthenticationTokenSecretManager.java#L155]
> So IllegalThreadStateException occured during
> AuthenticationTokenSecretManager#start,
>
> [https://github.com/apache/hbase/blob/8c2332d46532135723cc7a6084a2a125f3d9d8db/hbase-server/src/main/java/org/apache/hadoop/hbase/security/token/AuthenticationTokenSecretManager.java#L107]
> {noformat}
> 2021-05-03 07:59:02,066 | ERROR | main | Failed construction RegionServer |
> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:775)
> java.lang.IllegalThreadStateException
> at java.lang.Thread.start(Thread.java:708)
> at
> org.apache.hadoop.hbase.security.token.AuthenticationTokenSecretManager.start(AuthenticationTokenSecretManager.java:107)
> at
> org.apache.hadoop.hbase.ipc.NettyRpcServer.start(NettyRpcServer.java:131)
> at
> org.apache.hadoop.hbase.regionserver.RSRpcServices.start(RSRpcServices.java:1695)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.<init>(HRegionServer.java:756)
> at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> at
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> at
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServer.constructRegionServer(HRegionServer.java:3270)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.start(HRegionServerCommandLine.java:63)
> at
> org.apache.hadoop.hbase.regionserver.HRegionServerCommandLine.run(HRegionServerCommandLine.java:87)
> {noformat}
--
This message was sent by Atlassian Jira
(v8.3.4#803005)