[
https://issues.apache.org/jira/browse/HBASE-3266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu resolved HBASE-3266.
---------------------------
Resolution: Not A Problem
>From Todd:
3266 is probably no longer valid given heartbeats don't exist in trunk.
> Master does not seem to properly scan ZK for running RS during startup
> ----------------------------------------------------------------------
>
> Key: HBASE-3266
> URL: https://issues.apache.org/jira/browse/HBASE-3266
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.90.0
> Reporter: Todd Lipcon
> Priority: Critical
> Fix For: 0.92.0
>
>
> I was in the situation described by HBASE-3265, where I had a number of RS
> waiting on ROOT, but the master hadn't seen any RS checkins, so was waiting
> on checkins. To get past this, I restarted one of the region servers. The
> restarted server checked in, and the master began its startup.
> At this point the master started scanning /hbase/.logs for things to split.
> It correctly identified that the RS on haus01 was running (this is the one I
> restarted):
> 2010-11-23 00:21:25,595 INFO org.apache.hadoop.hbase.master.MasterFileSystem:
> Log folder
> hdfs://haus01.sf.cloudera.com:11020/hbase-normal/.logs/haus01.sf.cloudera.com,60020,1290500443143
> belongs to an existing region server
> but then incorrectly decided that the RS on haus02 was down:
> 2010-11-23 00:21:25,595 INFO org.apache.hadoop.hbase.master.MasterFileSystem:
> Log folder
> hdfs://haus01.sf.cloudera.com:11020/hbase-normal/.logs/haus02.sf.cloudera.com,60020,1290498411450
> doesn't belong to a known region server, splitting
> However ZK shows that this RS is up:
> [zk: haus01.sf.cloudera.com:2222(CONNECTED) 3] ls /hbase/rs
> [haus04.sf.cloudera.com,60020,1290498411533,
> haus05.sf.cloudera.com,60020,1290498411520,
> haus03.sf.cloudera.com,60020,1290498411518,
> haus01.sf.cloudera.com,60020,1290500443143,
> haus02.sf.cloudera.com,60020,1290498411450]
> splitLogsAfterStartup seems to check ServerManager.onlineServers, which best
> I can tell is derived from heartbeats and not from ZK (sorry if I got some of
> this wrong, still new to this new codebase)
> Of course, the master went into an infinite splitting loop at this point
> since haus02 is up and renewing its DFS lease on its logs.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira