HRS#closeAllRegions should take care of HRS#onlineRegions's weak consistency
----------------------------------------------------------------------------
Key: HBASE-4341
URL: https://issues.apache.org/jira/browse/HBASE-4341
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Fix For: 0.90.5
This's the reason of why did "https://builds.apache.org/job/hbase-0.90/282" get
failure . In this test, one case was timeout and cause the whole test process
got killed.
[logs]
Here's the related logs(From
org.apache.hadoop.hbase.mapreduce.TestTableMapReduce-output.txt):
{noformat}
2011-08-31 10:09:01,089 INFO
[RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker]
regionserver.Leases(124):
RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closing leases
2011-08-31 10:09:01,089 INFO
[RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker]
regionserver.Leases(131):
RegionServer:0;vesta.apache.org,52257,1314785332968.leaseChecker closed leases
2011-08-31 10:09:01,403 INFO
[RegionServer:0;vesta.apache.org,52257,1314785332968]
regionserver.HRegionServer(709): Waiting on 1 regions to close
2011-08-31 10:09:01,403 DEBUG
[RegionServer:0;vesta.apache.org,52257,1314785332968]
regionserver.HRegionServer(713):
{74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
2011-08-31 10:09:01,697 INFO [Master:0;vesta.apache.org:50036]
master.ServerManager(465): Waiting on regionserver(s) to go down
vesta.apache.org,52257,1314785332968
2011-08-31 10:09:02,697 INFO [Master:0;vesta.apache.org:50036]
master.ServerManager(465): Waiting on regionserver(s) to go down
vesta.apache.org,52257,1314785332968
2011-08-31 10:09:03,008 INFO [vesta.apache.org:50036.timeoutMonitor]
hbase.Chore(79): vesta.apache.org:50036.timeoutMonitor exiting
2011-08-31 10:09:03,697 INFO [Master:0;vesta.apache.org:50036]
master.ServerManager(465): Waiting on regionserver(s) to go down
vesta.apache.org,52257,1314785332968
2011-08-31 10:09:04,697 INFO [Master:0;vesta.apache.org:50036]
master.ServerManager(465): Waiting on regionserver(s) to go down
vesta.apache.org,52257,1314785332968
2011-08-31 10:09:05,698 INFO [Master:0;vesta.apache.org:50036]
master.ServerManager(465): Waiting on regionserver(s) to go down
vesta.apache.org,52257,1314785332968
2011-08-31 10:09:06,698 INFO [Master:0;vesta.apache.org:50036]
master.ServerManager(465): Waiting on regionserver(s) to go down
vesta.apache.org,52257,1314785332968
2011-08-31 10:09:07,698 INFO [Master:0;vesta.apache.org:50036]
master.ServerManager(465): Waiting on regionserver(s) to go down
vesta.apache.org,52257,1314785332968
{noformat}
[Analysis]
One region was opened during the RS's stopping.
This is method of "HRS#closeAllRegions":
{noformat}
protected void closeAllRegions(final boolean abort) {
closeUserRegions(abort);
-------------------------
if (meta != null) closeRegion(meta.getRegionInfo(), abort, false);
if (root != null) closeRegion(root.getRegionInfo(), abort, false);
}
{noformat}
HRS#onlineRegions is a ConcurrentHashMap. So walk down this map may not get all
the data if some entries are been added during the traverse. Once one region
was missed, it can't be closed anymore. And this regionserver will not be
stopped normally. Then the following logs occurred:
{noformat}
2011-08-31 10:09:01,403 INFO
[RegionServer:0;vesta.apache.org,52257,1314785332968]
regionserver.HRegionServer(709): Waiting on 1 regions to close
2011-08-31 10:09:01,403 DEBUG
[RegionServer:0;vesta.apache.org,52257,1314785332968]
regionserver.HRegionServer(713):
{74a7a8befdf9561dc1d90c4241afeac7=mrtest,uuu,1314785328546.74a7a8befdf9561dc1d90c4241afeac7.}
2011-08-31 10:09:01,697 INFO [Master:0;vesta.apache.org:50036]
master.ServerManager(465): Waiting on regionserver(s) to go down
vesta.apache.org,52257,1314785332968
{noformat}
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira