[ 
https://issues.apache.org/jira/browse/HBASE-8539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jeffrey Zhong updated HBASE-8539:
---------------------------------

    Attachment: hbase-8539-0.94-addendum.patch
                hbase-8539-addendum.patch

[[email protected]] found that the amended 
testRegionAssignmentAfterMasterRecoveryDueToZKExpiry by the patch failed for 
security build. 

The reason is that there are two listeners are registered when constructing 
rpcServer which won't be reinitialized during master recovery so we will miss 
these two if we clean all existing listeners during master recovery.
{code}
org.apache.hadoop.hbase.security.token.ZKSecretWatcher
org.apache.hadoop.hbase.zookeeper.ZKLeaderManager
{code}

Therefore, the simple approach is to remove a old listener instance of the same 
class when trying to add a new one. 

The master recovery logic on ZK session expired is problematic, I think we may 
need discuss it in a separate thread.



                
> Double(or tripple ...) ZooKeeper listeners of the same type when Master 
> recovers from ZK SessionExpiredException
> ----------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-8539
>                 URL: https://issues.apache.org/jira/browse/HBASE-8539
>             Project: HBase
>          Issue Type: Bug
>          Components: master
>    Affects Versions: 0.98.0, 0.94.7, 0.95.0
>            Reporter: Jeffrey Zhong
>            Assignee: Jeffrey Zhong
>             Fix For: 0.98.0, 0.94.8, 0.95.1
>
>         Attachments: double-registered listeners.png, 
> hbase-8539-0.94-addendum.patch, hbase-8539-0.94.patch, 
> hbase-8539-addendum.patch, hbase-8539.patch, hbase-8539.patch
>
>
> When Master tries to recover from zookeeper session expired exceptions, we 
> don't clean old registered listener instances. Therefore, it may end up we 
> have two(or more) listeners to double handling same events. Attached a screen 
> shot from debugger to show the issue.
> I considered to limit one listener per class while I think that would limit 
> the listener usage so I choose to clear exiting listeners during recovery for 
> the fix.
> (This issue is unrelated to the issue HBASE-8365 because I verified there is 
> no dup-listeners when HBASE-8365 happened)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to