[
https://issues.apache.org/jira/browse/HBASE-7521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13550902#comment-13550902
]
rajeshbabu commented on HBASE-7521:
-----------------------------------
Patch works fine for OPENING(HBASE-6060 patches also works fine in this case)
but there are issues with PENDING_OPEN.
{code}
for (RegionState rit : ritsNotYetOnServer) {
if (rit.isPendingOpen() || rit.isOpening()) {
LOG.info("Hijacking and reassigning " +
rit.getRegion().getRegionNameAsString() +
" that was on " + serverName + " in " + rit.getState() + " state.");
this.services.getAssignmentManager().assign(rit.getRegion(), true,
true, true);
}
}
{code}
Here if we see region in PENDING_OPEN on dead server we are assigning the
region.
In case of single region assign if we see a server is dead we will retry assign
to some other region server.
Main race condition can happen below as in HBASE-5816
{code}
if (!hijack && !state.isClosed() && !state.isOffline()) {
if (!regionAlreadyInTransitionException ) {
String msg = "Unexpected state : " + state + " .. Cannot transit it to
OFFLINE.";
this.master.abort(msg, new IllegalStateException(msg));
return -1;
}
LOG.debug("Unexpected state : " + state
+ " but retrying to assign because
RegionAlreadyInTransitionException.");
}
{code}
One more thing is there is a possibility of double assignment also.
In case of PENDING_OPEN we are not able to decide whether to retry or skip
retry(as thinking SSH can handle) becuase there are multiple cases
-> If RS went down after setting state to PENDING_OPEN, then SSH can assign the
region(as for the patch)
at the same time single assign also retry to assign because send open RPC will
fail with connection refused exception
Lets suppose if we skip the assign retry in case of connection refused
exception then there is one problem with this approach.
scenario is :
1) got region plan
2) destination server went down - ssh also processed (here we will see the
region in offline state and skip assignment).
3) change state to PENDING_OPEN
4) then send open RPC fail with the connection refused exception and we will
skip assign.
-> If we skip the assign in SSH also problem only
scenario is :
1) got region plan and region in PENDING_OPEN
2) just spawned OpenRegionHandler but didnt transition to OPENING
3) single assign came out as thinking RS is up and it will take care
4) RS went down
5) now SSH also will skip the region assignment
> fix HBASE-6060 (regions stuck in opening state) in 0.94
> -------------------------------------------------------
>
> Key: HBASE-7521
> URL: https://issues.apache.org/jira/browse/HBASE-7521
> Project: HBase
> Issue Type: Bug
> Reporter: Sergey Shelukhin
> Assignee: Sergey Shelukhin
> Attachments: HBASE-7521-v0.patch, HBASE-7521-v1.patch
>
>
> Discussion in HBASE-6060 implies that the fix there does not work on 0.94.
> Still, we may want to fix the issue in 0.94 (via some different fix) because
> the regions stuck in opening for ridiculous amounts of time is not a good
> thing to have.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira