[ 
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15572248#comment-15572248
 ] 

Pankaj Kumar commented on HBASE-16807:
--------------------------------------

Thanks @Heng Chen. 

> RegionServer will fail to report new active Hmaster until 
> HMaster/RegionServer failover
> ---------------------------------------------------------------------------------------
>
>                 Key: HBASE-16807
>                 URL: https://issues.apache.org/jira/browse/HBASE-16807
>             Project: HBase
>          Issue Type: Bug
>          Components: regionserver
>            Reporter: Pankaj Kumar
>            Assignee: Pankaj Kumar
>             Fix For: 2.0.0, 1.4.0, 1.3.1, 1.2.5, 0.98.24, 1.1.8
>
>         Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.1.patch, 
> HBASE-16807-branch-1.2.patch, HBASE-16807-branch-1.3.patch, 
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few 
> RegionServer missed master znode create notification on master failover. In 
> that case ZooKeeperNodeTracker will not refresh the cached data and 
> MasterAddressTracker will always return old active HM detail to Region server 
> on ServiceException.
> Though We create region server stub on failure but without refreshing the 
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
>   boolean refresh = false; // for the first time, use cached data
>     RegionServerStatusService.BlockingInterface intf = null;
>     boolean interrupted = false;
>     try {
>       while (keepLooping()) {
>         sn = this.masterAddressTracker.getMasterAddress(refresh);
>         if (sn == null) {
>           if (!keepLooping()) {
>             // give up with no connection.
>             LOG.debug("No master found and cluster is stopped; bailing out");
>             return null;
>           }
>           if (System.currentTimeMillis() > (previousLogTime + 1000)) {
>             LOG.debug("No master found; retry");
>             previousLogTime = System.currentTimeMillis();
>           }
>           refresh = true; // let's try pull it from ZK directly
>           if (sleep(200)) {
>             interrupted = true;
>           }
>           continue;
>         }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached 
> data. 
> So in above case RegionServer will never report active HMaster successfully 
> until HMaster failover or RegionServer restart.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to