[
https://issues.apache.org/jira/browse/HBASE-5875?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13262658#comment-13262658
]
ramkrishna.s.vasudevan commented on HBASE-5875:
-----------------------------------------------
I would like to get some suggestions in this
{code}
boolean rit = this.assignmentManager.
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.ROOT_REGIONINFO);
ServerName currentRootServer = null;
if (!catalogTracker.verifyRootRegionLocation(timeout)) {
currentRootServer = this.catalogTracker.getRootLocation();
{code}
Consider the case where my ROOT node is found in RIT. Hence the processRIT
will trigger the assignment.
It so happened that when i try to verifyRootRegionLocation the root node is
created but the OpenRegionHandler has not added the ROOT region in its
memory(very very corner case and this happened once while testing). So the
verifyRootRegionLocation returns false and hence the master thinks it an server
to be expired. So we just remove an normal active RS from the master memory
thinking it as dead. So i lose a RS itself from the master's list of online
servers. How can we handle this scenario?
Can we retry the verifyRootRegionLocation if it returns false and the boolean
variable 'rit' is true?
Or can we update the root region node in the RS side after updating the online
server list? Suggestions welcome...
> Process RIT and Master restart may remove an online server considering it as
> a dead server
> ------------------------------------------------------------------------------------------
>
> Key: HBASE-5875
> URL: https://issues.apache.org/jira/browse/HBASE-5875
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.1
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Fix For: 0.94.1
>
>
> If on master restart it finds the ROOT/META to be in RIT state, master tries
> to assign the ROOT region through ProcessRIT.
> Master will trigger the assignment and next will try to verify the Root
> Region Location.
> Root region location verification is done seeing if the RS has the region in
> its online list.
> If the master triggered assignment has not yet been completed in RS then the
> verify root region location will fail.
> Because it failed
> {code}
> splitLogAndExpireIfOnline(currentRootServer);
> {code}
> we do split log and also remove the server from online server list. Ideally
> here there is nothing to do in splitlog as no region server was restarted.
> So master, though the server is online, master just invalidates the region
> server.
> In a special case, if i have only one RS then my cluster will become non
> operative.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira