[
https://issues.apache.org/jira/browse/HBASE-3637?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13006614#comment-13006614
]
Todd Lipcon commented on HBASE-3637:
------------------------------------
2011-03-11 06:42:58,301 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22ea55e0f670002 Retrieved 65 byte(s) of data from znode
/hbase/unassigned/1028785192 and set watcher; region=.META.,,1,
server=trek08.sf.cloudera.com,60020,1299853933073, state=RS_ZK_REGION_OPENED
2011-03-11 06:42:58,301 INFO org.apache.hadoop.hbase.master.AssignmentManager:
Processing region .META.,,1.1028785192 in state RS_ZK_REGION_OPENED
2011-03-11 06:42:58,302 WARN org.apache.hadoop.hbase.master.AssignmentManager:
Region in transition 1028785192 references a server no longer up
trek08.sf.cloudera.com,60020,1299853933073; letting RIT timeout so will be
assigned elsewhere
2011-03-11 06:42:58,304 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
master:60000-0x22ea55e0f670002 Received ZooKeeper Event, type=NodeDataChanged,
state=SyncConnected, path=/hbase/unassigned/70236052
2011-03-11 06:42:58,305 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22ea55e0f670002 Retrieved 65 byte(s) of data from znode
/hbase/unassigned/70236052 and set watcher; region=-ROOT-,,0,
server=trek10.sf.cloudera.com,60020,1299854562169, state=RS_ZK_REGION_OPENED
2011-03-11 06:42:58,305 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Handling transition=RS_ZK_REGION_OPENED,
server=trek10.sf.cloudera.com,60020,1299854562169, region=70236052/-ROOT-
2011-03-11 06:42:58,307 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED
event for 70236052; deleting unassigned node
2011-03-11 06:42:58,308 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x22ea55e0f670002 Deleting existing unassigned node for 70236052
that is in expected state RS_ZK_REGION_OPENED
2011-03-11 06:42:58,313 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22ea55e0f670002 Retrieved 65 byte(s) of data from znode
/hbase/unassigned/70236052; data=region=-ROOT-,,0,
server=trek10.sf.cloudera.com,60020,1299854562169, state=RS_ZK_REGION_OPENED
2011-03-11 06:42:58,315 DEBUG
org.apache.hadoop.hbase.zookeeper.ZooKeeperWatcher:
master:60000-0x22ea55e0f670002 Received ZooKeeper Event, type=NodeDeleted,
state=SyncConnected, path=/hbase/unassigned/70236052
2011-03-11 06:42:58,315 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
master:60000-0x22ea55e0f670002 Successfully deleted unassigned node for region
70236052 in expected state RS_ZK_REGION_OPENED
2011-03-11 06:42:58,316 DEBUG
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region
-ROOT-,,0.70236052 on trek10.sf.cloudera.com,60020,1299854562169
2011-03-11 06:42:59,097 INFO org.apache.hadoop.hbase.master.AssignmentManager:
Regions in transition timed out: .META.,,1.1028785192 state=OPENING,
ts=1299854016886
2011-03-11 06:42:59,097 INFO org.apache.hadoop.hbase.master.AssignmentManager:
Region has been OPENING for too long, reassigning region=.META.,,1.1028785192
2011-03-11 06:42:59,098 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil:
master:60000-0x22ea55e0f670002 Retrieved 65 byte(s) of data from znode
/hbase/unassigned/1028785192; data=region=.META.,,1,
server=trek08.sf.cloudera.com,60020,1299853933073, state=RS_ZK_REGION_OPENED
2011-03-11 06:42:59,099 DEBUG org.apache.hadoop.hbase.master.AssignmentManager:
Region has transitioned to OPENED, allowing watched event handlers to process
> Region stuck in OPENED state
> ----------------------------
>
> Key: HBASE-3637
> URL: https://issues.apache.org/jira/browse/HBASE-3637
> Project: HBase
> Issue Type: Bug
> Components: master
> Affects Versions: 0.92.0
> Reporter: Todd Lipcon
> Priority: Critical
> Fix For: 0.92.0
>
>
> I don't 100% understand how this happened, but the following was observed:
> - META is in OPENED state in ZK, for a server which no longer exists
> - Handler sees that server is dead, and figures that the RIT timeout will
> handle it
> - RIT timeout sees that it's already in OPENED state, and assumes that the
> OPENED handler will handle it
> - loops in timeout state forever, never actually getting reassigned
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira