If .META. offline between OPENING and OPENED, then wrong server location in
.META. is possible
----------------------------------------------------------------------------------------------
Key: HBASE-3362
URL: https://issues.apache.org/jira/browse/HBASE-3362
Project: HBase
Issue Type: Bug
Reporter: stack
Assignee: stack
Priority: Critical
Fix For: 0.90.0
This is a good one. It happened to me testing OOME in split logging.
* Balancer moves region to new location, regionservrer X.
* New location regionserver X successfully opens the region and then goes to
update .META.
* At this point, the server carrying .META. crashes.
* Regionserver X is stuck waiting on .META. to come back online. It takes so
long master times out the region-in-transition
* Master assigns the region elsewhere to regionserver Y
* It opens successfully on regionserver Y and then it also parks waiting on
.META. coming online
* .META. comes online
* The two servers X and Y race to update .META.
I saw case where server X edit went in after server Ys edit which means that
lookups in .META. get the wrong server. HBCK can detect this situation.
RegionServer X when it wakes up coreeclty notices that its lost control of the
region but the damage is done -- where damage is .META. edit.
Chatting with Jon, he suggested that regionserver X should 'rollback' the
.META. edit -- do explicit delete of what it added. This would work I think
but chatting more, I'll make a fix that keeps updating the zookeeper OPENING
state while edit goes on in a separate thread. Our continuous setting of
OPENING will make it so region-in-transition does not timeout.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.