[
https://issues.apache.org/jira/browse/HBASE-16807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Heng Chen updated HBASE-16807:
------------------------------
Fix Version/s: 0.98.24
1.3.1
1.4.0
> RegionServer will fail to report new active Hmaster until
> HMaster/RegionServer failover
> ---------------------------------------------------------------------------------------
>
> Key: HBASE-16807
> URL: https://issues.apache.org/jira/browse/HBASE-16807
> Project: HBase
> Issue Type: Bug
> Components: regionserver
> Reporter: Pankaj Kumar
> Assignee: Pankaj Kumar
> Fix For: 2.0.0, 1.4.0, 1.3.1, 0.98.24
>
> Attachments: HBASE-16807-0.98.patch, HBASE-16807-branch-1.3.patch,
> HBASE-16807-branch-1.patch, HBASE-16807.patch
>
>
> It's little weird, but it happened in the product environment that few
> RegionServer missed master znode create notification on master failover. In
> that case ZooKeeperNodeTracker will not refresh the cached data and
> MasterAddressTracker will always return old active HM detail to Region server
> on ServiceException.
> Though We create region server stub on failure but without refreshing the
> MasterAddressTracker data.
> In HRegionServer.createRegionServerStatusStub()
> {code}
> boolean refresh = false; // for the first time, use cached data
> RegionServerStatusService.BlockingInterface intf = null;
> boolean interrupted = false;
> try {
> while (keepLooping()) {
> sn = this.masterAddressTracker.getMasterAddress(refresh);
> if (sn == null) {
> if (!keepLooping()) {
> // give up with no connection.
> LOG.debug("No master found and cluster is stopped; bailing out");
> return null;
> }
> if (System.currentTimeMillis() > (previousLogTime + 1000)) {
> LOG.debug("No master found; retry");
> previousLogTime = System.currentTimeMillis();
> }
> refresh = true; // let's try pull it from ZK directly
> if (sleep(200)) {
> interrupted = true;
> }
> continue;
> }
> {code}
> Here we refresh node only when 'sn' is NULL otherwise it will use same cached
> data.
> So in above case RegionServer will never report active HMaster successfully
> until HMaster failover or RegionServer restart.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)