Maryann Xue created HBASE-6299:
----------------------------------

             Summary: RS starts region open while fails ack to 
HMaster.sendRegionOpen() causes inconsistency in HMaster's region state and a 
series of successive problems.
                 Key: HBASE-6299
                 URL: https://issues.apache.org/jira/browse/HBASE-6299
             Project: HBase
          Issue Type: Bug
          Components: master
    Affects Versions: 0.94.0, 0.90.6
            Reporter: Maryann Xue
            Assignee: Maryann Xue
            Priority: Critical


1. HMaster tries to assign a region to an RS.
2. HMaster creates a RegionState for this region and puts it into 
regionsInTransition.
3. In the first assign attempt, HMaster calls RS.openRegion(). The RS receives 
the open region request and starts to proceed, with success eventually. 
However, due to network problems, HMaster fails to receive the response for the 
openRegion() call, and the call times out.
4. HMaster attemps to assign for a second time, choosing another RS. 
5. But since the HMaster's OpenedRegionHandler has been triggered by the region 
open of the previous RS, and the RegionState has already been removed from 
regionsInTransition, HMaster finds invalid and ignores the unassigned ZK node 
"RS_ZK_REGION_OPENING" updated by the second attempt.
6. The unassigned ZK node stays and a later unassign fails coz 
RS_ZK_REGION_CLOSING cannot be created.
{code}
2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Using pre-existing plan for region 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
 
plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
 src=swbss-hadoop-004,60020,1340890123243, 
dest=swbss-hadoop-006,60020,1340890678078
2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 to swbss-hadoop-006,60020,1340890678078
2012-06-29 07:03:38,870 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=M_ZK_REGION_OFFLINE, server=swbss-hadoop-002:60000, 
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:06:28,882 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=swbss-hadoop-006,60020,1340890678078, 
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:06:32,291 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=swbss-hadoop-006,60020,1340890678078, 
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, 
server=swbss-hadoop-006,60020,1340890678078, 
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:06:32,299 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 from serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=518945, 
regions=575, usedHeap=15282, maxHeap=31301); deleting unassigned node
2012-06-29 07:06:32,299 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x2377fee2ae80007 Deleting existing unassigned node for 
b713fd655fa02395496c5a6e39ddf568 that is in expected state RS_ZK_REGION_OPENED
2012-06-29 07:06:32,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x2377fee2ae80007 Successfully deleted unassigned node for region 
b713fd655fa02395496c5a6e39ddf568 in expected state RS_ZK_REGION_OPENED
2012-06-29 07:06:32,301 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: The master has 
opened the region 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 that was online on serverName=swbss-hadoop-006,60020,1340890678078, 
load=(requests=518945, regions=575, usedHeap=15282, maxHeap=31301)
2012-06-29 07:07:41,140 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Failed assignment of 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 to serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
regions=575, usedHeap=0, maxHeap=0), trying to assign elsewhere instead; retry=0
java.net.SocketTimeoutException: Call to /172.16.0.6:60020 failed on socket 
timeout exception: java.net.SocketTimeoutException: 120000 millis timeout while 
waiting for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/172.16.0.2:53765 
remote=/172.16.0.6:60020]
        at 
org.apache.hadoop.hbase.ipc.HBaseClient.wrapException(HBaseClient.java:805)
        at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:778)
        at 
org.apache.hadoop.hbase.ipc.HBaseRPC$Invoker.invoke(HBaseRPC.java:283)
        at $Proxy8.openRegion(Unknown Source)
        at 
org.apache.hadoop.hbase.master.ServerManager.sendRegionOpen(ServerManager.java:573)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:1127)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:912)
        at 
org.apache.hadoop.hbase.master.AssignmentManager.assign(AssignmentManager.java:892)
        at 
org.apache.hadoop.hbase.master.handler.ClosedRegionHandler.process(ClosedRegionHandler.java:92)
        at 
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:162)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:662)
Caused by: java.net.SocketTimeoutException: 120000 millis timeout while waiting 
for channel to be ready for read. ch : 
java.nio.channels.SocketChannel[connected local=/172.16.0.2:53765 
remote=/172.16.0.6:60020]
        at 
org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:155)
        at 
org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:128)
        at java.io.FilterInputStream.read(FilterInputStream.java:116)
        at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection$PingInputStream.read(HBaseClient.java:301)
        at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
        at java.io.BufferedInputStream.read(BufferedInputStream.java:237)
        at java.io.DataInputStream.readInt(DataInputStream.java:370)
        at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.receiveResponse(HBaseClient.java:541)
        at 
org.apache.hadoop.hbase.ipc.HBaseClient$Connection.run(HBaseClient.java:479)
2012-06-29 07:07:41,142 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
No previous transition plan was found (or we are ignoring an existing plan) for 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 so generated a random one; 
hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
 src=, dest=swbss-hadoop-164,60020,1340888346294; 15 (online=15, 
exclude=serverName=swbss-hadoop-006,60020,1340890678078, load=(requests=0, 
regions=575, usedHeap=0, maxHeap=0)) available servers
2012-06-29 07:07:41,142 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x2377fee2ae80007 Creating (or updating) unassigned node for 
b713fd655fa02395496c5a6e39ddf568 with OFFLINE state
2012-06-29 07:07:41,145 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Using pre-existing plan for region 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.;
 
plan=hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
 src=, dest=swbss-hadoop-164,60020,1340888346294
2012-06-29 07:07:41,145 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 to swbss-hadoop-164,60020,1340888346294
2012-06-29 07:07:41,149 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=swbss-hadoop-164,60020,1340888346294, 
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:07:41,150 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENING for region b713fd655fa02395496c5a6e39ddf568 from server 
swbss-hadoop-164,60020,1340888346294 but region was in  the state null and not 
in expected PENDING_OPEN or OPENING states
2012-06-29 07:07:41,296 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, 
server=swbss-hadoop-164,60020,1340888346294, 
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:07:41,296 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENING for region b713fd655fa02395496c5a6e39ddf568 from server 
swbss-hadoop-164,60020,1340888346294 but region was in  the state null and not 
in expected PENDING_OPEN or OPENING states
2012-06-29 07:07:41,302 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, 
server=swbss-hadoop-164,60020,1340888346294, 
region=b713fd655fa02395496c5a6e39ddf568
2012-06-29 07:07:41,302 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region b713fd655fa02395496c5a6e39ddf568 from server 
swbss-hadoop-164,60020,1340888346294 but region was in  the state null and not 
in expected PENDING_OPEN or OPENING states
2012-06-29 07:08:38,872 INFO org.apache.hadoop.hbase.master.HMaster: balance 
hri=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.,
 src=swbss-hadoop-006,60020,1340890678078, 
dest=swbss-hadoop-008,60020,1340891085175
2012-06-29 07:08:38,872 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Starting unassignment of region 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 (offlining)
2012-06-29 07:08:47,875 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Sent CLOSE to serverName=swbss-hadoop-006,60020,1340890678078, 
load=(requests=0, regions=0, usedHeap=0, maxHeap=0) for region 
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
...
2012-06-29 08:04:37,681 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 state=PENDING_CLOSE, ts=1340926468331, server=null
2012-06-29 08:04:37,681 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been PENDING_CLOSE for too long, running forced unassign again on 
region=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
2012-06-29 08:04:47,681 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Regions in transition timed out:  
CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
 state=PENDING_CLOSE, ts=1340926468331, server=null
2012-06-29 08:04:47,682 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
Region has been PENDING_CLOSE for too long, running forced unassign again on 
region=CDR_STATS_TRAFFIC,13184390567|20120508|17||2|3|913,1337256975556.b713fd655fa02395496c5a6e39ddf568.
{code}



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to