[
https://issues.apache.org/jira/browse/HBASE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack reopened HBASE-3368:
--------------------------
Reopening. Here is full transcript for problem region, a region now stuck in
RIT:
{code}
2011-02-03 06:41:29,466 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x12dea40e4560000 Async create of unassigned node for
da7cdcde7f10405c294144a35709f90f with OFFLINE state
2011-02-03 06:41:29,601 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager$CreateUnassignedAsyncCallback:
rs=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
state=OFFLINE, ts=1296715289466, server=sv2borg181,60020,1296715280082
2011-02-03 06:41:29,826 DEBUG
org.apache.hadoop.hbase.master.AssignmentManager$ExistsUnassignedAsyncCallback:
rs=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
state=OFFLINE, ts=1296715289466
2011-02-03 06:41:41,231 INFO org.apache.hadoop.hbase.master.ServerManager:
Received REGION_SPLIT:
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.:
Daughters;
TestTable,0073456533,1296715299597.51d04ac711214779d429326cfc46f4df.,
TestTable,0073464447,1296715299597.f15a72e22cc041653142e5cf2e788ee8. from
sv2borg181,60020,1296715280082
2011-02-03 06:41:42,172 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12dea40e4560000 Retrieved 112 byte(s) of data from znode
/hbase/unassigned/da7cdcde7f10405c294144a35709f90f;
data=region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.,
server=sv2borg181,60020,1296715280082, state=RS_ZK_REGION_OPENED
2011-02-03 06:41:42,172 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Split report has RIT node (shouldnt have one): REGION => {NAME =>
'TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.',
STARTKEY => '0073456533', ENDKEY => '0073472373', ENCODED =>
da7cdcde7f10405c294144a35709f90f, TABLE => {{NAME => 'TestTable', FAMILIES =>
[{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS =>
'3', COMPRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} node:
region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.,
server=sv2borg181,60020,1296715280082, state=RS_ZK_REGION_OPENED
2011-02-03 06:41:46,305 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
master:60000-0x12dea40e4560000 Received ZooKeeper Event, type=NodeDataChanged,
state=SyncConnected, path=/hbase/unassigned/da7cdcde7f10405c294144a35709f90f
2011-02-03 06:41:46,330 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x12dea40e4560000 Retrieved 112 byte(s) of data from znode
/hbase/unassigned/da7cdcde7f10405c294144a35709f90f and set watcher;
region=TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.,
server=sv2borg181,60020,1296715280082, state=RS_ZK_REGION_OPENED
2011-02-03 06:41:46,330 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENED, server=sv2borg181,60020,1296715280082,
region=da7cdcde7f10405c294144a35709f90f
2011-02-03 06:41:46,330 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Received OPENED for region da7cdcde7f10405c294144a35709f90f from server
sv2borg181,60020,1296715280082 but region was in the state null and not in
expected PENDING_OPEN or OPENING states
2011-02-03 06:43:45,920 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted daughter reference
TestTable,0073456533,1296715299597.51d04ac711214779d429326cfc46f4df.,
qualifier=splitA, from parent
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
2011-02-03 06:43:45,924 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted daughter reference
TestTable,0073464447,1296715299597.f15a72e22cc041653142e5cf2e788ee8.,
qualifier=splitB, from parent
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f.
2011-02-03 06:43:45,924 DEBUG org.apache.hadoop.hbase.master.CatalogJanitor:
Deleting region
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. because
daughter splits no longer hold references
2011-02-03 06:43:45,924 DEBUG org.apache.hadoop.hbase.regionserver.HRegion:
DELETING region
hdfs://sv2borg180:10000/hbase/TestTable/da7cdcde7f10405c294144a35709f90f
2011-02-03 06:43:45,957 INFO org.apache.hadoop.hbase.catalog.MetaEditor:
Deleted region
TestTable,0073456533,1296708597440.da7cdcde7f10405c294144a35709f90f. from META
2011-02-03 07:30:51,115 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
master:60000-0x12dea40e4560000 Received ZooKeeper Event, type=NodeDeleted,
state=SyncConnected, path=/hbase/unassigned/da7cdcde7f10405c294144a35709f90f
{code}
Workaround is deleting the items from zk. Seems to work fine.
> Split message can come in before region opened message; results in 'Region
> has been PENDING_CLOSE for too long' cycle
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-3368
> URL: https://issues.apache.org/jira/browse/HBASE-3368
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Priority: Blocker
> Fix For: 0.90.0
>
>
> Another good one. Look at these excerpts from master log:
> {code}
> 2010-12-16 00:49:45,749 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.:
> Daughters;
> TestTable,0078922610,1292460584999.c8b95dfc9a671083bafdaa0341279777.,
> TestTable,0078933586,
> 1292460584999.7cc636c9a7274eec4e784df2efebbca3. from
> XXX185,60020,1292460570976
> ....
> 2010-12-16 00:49:46,132 DEBUG
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b. on
> XXX185,60020,1292460570976
> {code}
> ... so the split will have cleared the parent from in-memory data structures
> and then the open handler will add them back (though region is offlined,
> split).
> Then the balancer runs....... only no one is holding the region thats being
> balanced.
> Over on XXX185 I see the open and then split at these times:
> {code}
> 2010-12-16 00:49:43,740 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> .....
> 2010-12-16 00:49:45,003 INFO
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of
> region TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> {code}
> So, the fact that it takes the Master a while to get around to the zk watcher
> processing messes us up.
> Root problem is that we're using two different message buses, zk and then
> heartbeat. Intent is to do all over zk and remove hearbeat but looking at
> what to do for 0.90.0.
--
This message is automatically generated by JIRA.
-
For more information on JIRA, see: http://www.atlassian.com/software/jira