Split message can come in before region opened message; results in 'Region has 
been PENDING_CLOSE for too long' cycle
---------------------------------------------------------------------------------------------------------------------

                 Key: HBASE-3368
                 URL: https://issues.apache.org/jira/browse/HBASE-3368
             Project: HBase
          Issue Type: Bug
            Reporter: stack
             Fix For: 0.90.0


Another good one.  Look at these excerpts from master log:

{code}
2010-12-16 00:49:45,749 INFO org.apache.hadoop.hbase.master.ServerManager: 
Received REGION_SPLIT: 
TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.: 
Daughters; 
TestTable,0078922610,1292460584999.c8b95dfc9a671083bafdaa0341279777., 
TestTable,0078933586,  
1292460584999.7cc636c9a7274eec4e784df2efebbca3. from XXX185,60020,1292460570976
....
2010-12-16 00:49:46,132 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Opened region 
TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b. on 
XXX185,60020,1292460570976
{code}

... so the split will have cleared the parent from in-memory data structures 
and then the open handler will add them back (though region is offlined, split).

Then the balancer runs....... only no one is holding the region thats being 
balanced.

Over on XXX185 I see the open and then split at these times:

{code}
2010-12-16 00:49:43,740 DEBUG 
org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened 
TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.
.....
2010-12-16 00:49:45,003 INFO 
org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region 
TestTable,0078922610,1292373363753.490b382bae33642d12cd717b5785698b.

{code}

So, the fact that it takes the Master a while to get around to the zk watcher 
processing messes us up.

Root problem is that we're using two different message buses, zk and then 
heartbeat.  Intent is to do all over zk and remove hearbeat but looking at what 
to do for 0.90.0.


-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to