[
https://issues.apache.org/jira/browse/HBASE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
stack resolved HBASE-3368.
--------------------------
Resolution: Duplicate
Fixed by "HBASE-3559 Move report of split to master OFF the heartbeat channel"
> Split message can come in before region opened message; results in 'Region
> has been PENDING_CLOSE for too long' cycle
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HBASE-3368
> URL: https://issues.apache.org/jira/browse/HBASE-3368
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 0.92.0
>
> Attachments: 3368-v2.txt, 3368.txt
>
>
> Another good one. Look at these excerpts from master log:
> {code}
> 2010-12-16 00:49:45,749 INFO org.apache.hadoop.hbase.master.ServerManager:
> Received REGION_SPLIT:
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.:
> Daughters;
> TestTable,0078922610,1292460584999.c8b95dfc9a671083bafdaa0341279777.,
> TestTable,0078933586,
> 1292460584999.7cc636c9a7274eec4e784df2efebbca3. from
> XXX185,60020,1292460570976
> ....
> 2010-12-16 00:49:46,132 DEBUG
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b. on
> XXX185,60020,1292460570976
> {code}
> ... so the split will have cleared the parent from in-memory data structures
> and then the open handler will add them back (though region is offlined,
> split).
> Then the balancer runs....... only no one is holding the region thats being
> balanced.
> Over on XXX185 I see the open and then split at these times:
> {code}
> 2010-12-16 00:49:43,740 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> .....
> 2010-12-16 00:49:45,003 INFO
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of
> region TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> {code}
> So, the fact that it takes the Master a while to get around to the zk watcher
> processing messes us up.
> Root problem is that we're using two different message buses, zk and then
> heartbeat. Intent is to do all over zk and remove hearbeat but looking at
> what to do for 0.90.0.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira