[ 
https://issues.apache.org/jira/browse/HBASE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-3368:
--------------------------


Reopening.  Here is full transcript for problem region, a region now stuck in 
RIT:

{code}
2011-02-03 06:41:29,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:60000-0x12dea40e4560000 Async create of unassigned node for 
da7cdcde7f10405c294144a35709f90f with OFFLINE state
2011-02-03 06:41:29,601 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback: 
rs=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. 
state=OFFLINE, ts=1296715289466, server=sv2borg181,60020,1296715280082
2011-02-03 06:41:29,826 DEBUG 
org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback: 
rs=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. 
state=OFFLINE, ts=1296715289466
2011-02-03 06:41:41,231 INFO org.apache.hadoop.hbase.master.ServerManager: 
Received REGION_SPLIT: 
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.: 
Daughters; 
TestTable,0073456533,1296715299597.51d04ac711214779d429326cfc46f4df., 
TestTable,0073464447,1296715299597.f15a72e22cc041653142e5cf2e788ee8. from 
sv2borg181,60020,1296715280082
2011-02-03 06:41:42,172 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
master:60000-0x12dea40e4560000 Retrieved 112 byte(s) of data from znode 
/hbase/unassigned/da7cdcde7f10405c294144a35709f90f; 
data=region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.,
 server=sv2borg181,60020,1296715280082, state=RS_ZK_REGION_OPENED
2011-02-03 06:41:42,172 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Split report has RIT node (shouldnt have one): REGION => {NAME => 
'TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.', 
STARTKEY => '0073456533', ENDKEY => '0073472373', ENCODED => 
da7cdcde7f10405c294144a35709f90f, TABLE => {{NAME => 'TestTable', FAMILIES => 
[{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => 
'3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536', 
IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node: 
region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f., 
server=sv2borg181,60020,1296715280082, state=RS_ZK_REGION_OPENED
2011-02-03 06:41:46,305 DEBUG 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
master:60000-0x12dea40e4560000 Received ZooKeeper Event, type=NodeDataChanged, 
state=SyncConnected, path=/hbase/unassigned/da7cdcde7f10405c294144a35709f90f
2011-02-03 06:41:46,330 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
master:60000-0x12dea40e4560000 Retrieved 112 byte(s) of data from znode 
/hbase/unassigned/da7cdcde7f10405c294144a35709f90f and set watcher; 
region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f., 
server=sv2borg181,60020,1296715280082, state=RS_ZK_REGION_OPENED
2011-02-03 06:41:46,330 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=sv2borg181,60020,1296715280082, 
region=da7cdcde7f10405c294144a35709f90f
2011-02-03 06:41:46,330 WARN org.apache.hadoop.hbase.master.AssignmentManager: 
Received OPENED for region da7cdcde7f10405c294144a35709f90f from server 
sv2borg181,60020,1296715280082 but region was in  the state null and not in 
expected PENDING_OPEN or OPENING states
2011-02-03 06:43:45,920 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
Deleted daughter reference 
TestTable,0073456533,1296715299597.51d04ac711214779d429326cfc46f4df., 
qualifier=splitA, from parent 
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
2011-02-03 06:43:45,924 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
Deleted daughter reference 
TestTable,0073464447,1296715299597.f15a72e22cc041653142e5cf2e788ee8., 
qualifier=splitB, from parent 
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
2011-02-03 06:43:45,924 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor: 
Deleting region 
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. because 
daughter splits no longer hold references
2011-02-03 06:43:45,924 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
DELETING region 
hdfs://sv2borg180:10000/hbase/TestTable/da7cdcde7f10405c294144a35709f90f
2011-02-03 06:43:45,957 INFO org.apache.hadoop.hbase.catalog.MetaEditor: 
Deleted region 
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. from META
2011-02-03 07:30:51,115 DEBUG 
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher: 
master:60000-0x12dea40e4560000 Received ZooKeeper Event, type=NodeDeleted, 
state=SyncConnected, path=/hbase/unassigned/da7cdcde7f10405c294144a35709f90f
{code}


Workaround is deleting the items from zk.  Seems to work fine.

> Split message can come in before region opened message; results in 'Region 
> has been PENDING_CLOSE for too long' cycle
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3368
>                 URL: https://issues.apache.org/jira/browse/HBASE-3368
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Blocker
>             Fix For: 0.90.0
>
>
> Another good one.  Look at these excerpts from master log:
> {code}
> 2010-12-16 00:49:45,749 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.: 
> Daughters; 
> TestTable,0078922610,1292460584999.c8b95dfc9a671083bafdaa0341279777., 
> TestTable,0078933586,  
> 1292460584999.7cc636c9a7274eec4e784df2efebbca3. from 
> XXX185,60020,1292460570976
> ....
> 2010-12-16 00:49:46,132 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b. on 
> XXX185,60020,1292460570976
> {code}
> ... so the split will have cleared the parent from in-memory data structures 
> and then the open handler will add them back (though region is offlined, 
> split).
> Then the balancer runs....... only no one is holding the region thats being 
> balanced.
> Over on XXX185 I see the open and then split at these times:
> {code}
> 2010-12-16 00:49:43,740 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> .....
> 2010-12-16 00:49:45,003 INFO 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
> region TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> {code}
> So, the fact that it takes the Master a while to get around to the zk watcher 
> processing messes us up.
> Root problem is that we're using two different message buses, zk and then 
> heartbeat.  Intent is to do all over zk and remove hearbeat but looking at 
> what to do for 0.90.0.

-- 
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to