[
https://issues.apache.org/jira/browse/HBASE-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jeffrey Zhong updated HBASE-8539:
---------------------------------
Attachment: hbase-8539-0.94-addendum.patch
hbase-8539-addendum.patch
[[email protected]] found that the amended
testRegionAssignmentAfterMasterRecoveryDueToZKExpiry by the patch failed for
security build.
The reason is that there are two listeners are registered when constructing
rpcServer which won't be reinitialized during master recovery so we will miss
these two if we clean all existing listeners during master recovery.
{code}
org.apache.hadoop.hbase.security.token.ZKSecretWatcher
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager
{code}
Therefore, the simple approach is to remove a old listener instance of the same
class when trying to add a new one.
The master recovery logic on ZK session expired is problematic, I think we may
need discuss it in a separate thread.
> Double(or tripple ...) ZooKeeper listeners of the same type when Master
> recovers from ZK SessionExpiredException
> ----------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-8539
> URL: https://issues.apache.org/jira/browse/HBASE-8539
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.98.0, 0.94.7, 0.95.0
> Reporter: Jeffrey Zhong
> Assignee: Jeffrey Zhong
> Fix For: 0.98.0, 0.94.8, 0.95.1
>
> Attachments: double-registered listeners.png,
> hbase-8539-0.94-addendum.patch, hbase-8539-0.94.patch,
> hbase-8539-addendum.patch, hbase-8539.patch, hbase-8539.patch
>
>
> When Master tries to recover from zookeeper session expired exceptions, we
> don't clean old registered listener instances. Therefore, it may end up we
> have two(or more) listeners to double handling same events. Attached a screen
> shot from debugger to show the issue.
> I considered to limit one listener per class while I think that would limit
> the listener usage so I choose to clear exiting listeners during recovery for
> the fix.
> (This issue is unrelated to the issue HBASE-8365 because I verified there is
> no dup-listeners when HBASE-8365 happened)
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira