[
https://issues.apache.org/jira/browse/HBASE-3362?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12971896#action_12971896
]
HBase Review Board commented on HBASE-3362:
-------------------------------------------
Message from: [email protected]
-----------------------------------------------------------
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/1298/
-----------------------------------------------------------
Review request for hbase and Jonathan Gray.
Summary
-------
M
src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
Removed stale comments and TODOs.
Added a 'version' datamenber, the znode edit version which we keep across open
process.
Refactored the setting of OPENING out into a method that is used in multiple
places
now rather than repeat code. Did this in new tickleOpening method.
Added new PostOpenDeployTasksThread which we run to do the postOpenDeployTasks.
While its running we update OPENING state if its running a while.
This addresses bug hbase-3362.
http://issues.apache.org/jira/browse/hbase-3362
Diffs
-----
trunk/src/main/java/org/apache/hadoop/hbase/regionserver/handler/OpenRegionHandler.java
1049707
Diff: http://review.cloudera.org/r/1298/diff
Testing
-------
Ran it on my cluster. Seems to work as the old code did.
Thanks,
stack
> If .META. offline between OPENING and OPENED, then wrong server location in
> .META. is possible
> ----------------------------------------------------------------------------------------------
>
> Key: HBASE-3362
> URL: https://issues.apache.org/jira/browse/HBASE-3362
> Project: HBase
> Issue Type: Bug
> Reporter: stack
> Assignee: stack
> Priority: Critical
> Fix For: 0.90.0
>
>
> This is a good one. It happened to me testing OOME in split logging.
> * Balancer moves region to new location, regionservrer X.
> * New location regionserver X successfully opens the region and then goes to
> update .META.
> * At this point, the server carrying .META. crashes.
> * Regionserver X is stuck waiting on .META. to come back online. It takes so
> long master times out the region-in-transition
> * Master assigns the region elsewhere to regionserver Y
> * It opens successfully on regionserver Y and then it also parks waiting on
> .META. coming online
> * .META. comes online
> * The two servers X and Y race to update .META.
> I saw case where server X edit went in after server Ys edit which means that
> lookups in .META. get the wrong server. HBCK can detect this situation.
> RegionServer X when it wakes up coreeclty notices that its lost control of
> the region but the damage is done -- where damage is .META. edit.
> Chatting with Jon, he suggested that regionserver X should 'rollback' the
> .META. edit -- do explicit delete of what it added. This would work I think
> but chatting more, I'll make a fix that keeps updating the zookeeper OPENING
> state while edit goes on in a separate thread. Our continuous setting of
> OPENING will make it so region-in-transition does not timeout.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.