[ 
https://issues.apache.org/jira/browse/HBASE-3368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack resolved HBASE-3368.
--------------------------

    Resolution: Duplicate

Fixed by "HBASE-3559  Move report of split to master OFF the heartbeat channel"

> Split message can come in before region opened message; results in 'Region 
> has been PENDING_CLOSE for too long' cycle
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HBASE-3368
>                 URL: https://issues.apache.org/jira/browse/HBASE-3368
>             Project: HBase
>          Issue Type: Bug
>            Reporter: stack
>            Assignee: stack
>            Priority: Critical
>             Fix For: 0.92.0
>
>         Attachments: 3368-v2.txt, 3368.txt
>
>
> Another good one.  Look at these excerpts from master log:
> {code}
> 2010-12-16 00:49:45,749 INFO org.apache.hadoop.hbase.master.ServerManager: 
> Received REGION_SPLIT: 
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.: 
> Daughters; 
> TestTable,0078922610,1292460584999.c8b95dfc9a671083bafdaa0341279777., 
> TestTable,0078933586,  
> 1292460584999.7cc636c9a7274eec4e784df2efebbca3. from 
> XXX185,60020,1292460570976
> ....
> 2010-12-16 00:49:46,132 DEBUG 
> org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b. on 
> XXX185,60020,1292460570976
> {code}
> ... so the split will have cleared the parent from in-memory data structures 
> and then the open handler will add them back (though region is offlined, 
> split).
> Then the balancer runs....... only no one is holding the region thats being 
> balanced.
> Over on XXX185 I see the open and then split at these times:
> {code}
> 2010-12-16 00:49:43,740 DEBUG 
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
> TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> .....
> 2010-12-16 00:49:45,003 INFO 
> org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of 
> region TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
> {code}
> So, the fact that it takes the Master a while to get around to the zk watcher 
> processing messes us up.
> Root problem is that we're using two different message buses, zk and then 
> heartbeat.  Intent is to do all over zk and remove hearbeat but looking at 
> what to do for 0.90.0.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to