If .META. offline between OPENING and OPENED, then wrong server location in 
.META. is possible
----------------------------------------------------------------------------------------------

                 Key: HBASE-3362
                 URL: https://issues.apache.org/jira/browse/HBASE-3362
             Project: HBase
          Issue Type: Bug
            Reporter: stack
            Assignee: stack
            Priority: Critical
             Fix For: 0.90.0


This is a good one.  It happened to me testing OOME in split logging.

* Balancer moves region to new location, regionservrer X.
* New location regionserver X successfully opens the region and then goes to 
update .META.
* At this point, the server carrying .META. crashes.
* Regionserver X is stuck waiting on .META. to come back online.  It takes so 
long master times out the region-in-transition
* Master assigns the region elsewhere to regionserver Y
* It opens successfully on regionserver Y and then it also parks waiting on 
.META. coming online
* .META. comes online
* The two servers X and Y race to update .META.

I saw case where server X edit went in after server Ys edit which means that 
lookups in .META. get the wrong server.  HBCK can detect this situation.

RegionServer X when it wakes up coreeclty notices that its lost control of the 
region but the damage is done -- where damage is .META. edit.

Chatting with Jon, he suggested that regionserver X should 'rollback' the 
.META. edit -- do explicit delete of what it added.  This would work I think 
but chatting more, I'll make a fix that keeps updating the zookeeper OPENING 
state while edit goes on in a separate thread.  Our continuous setting of 
OPENING will make it so region-in-transition does not timeout.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to