ndimiduk commented on a change in pull request #2014:
URL: https://github.com/apache/hbase/pull/2014#discussion_r492290457
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/TransitRegionStateProcedure.java
##########
@@ -200,14 +200,21 @@ private void queueAssign(MasterProcedureEnv env,
RegionStateNode regionNode)
}
}
- private void openRegion(MasterProcedureEnv env, RegionStateNode regionNode)
throws IOException {
+ private void openRegion(MasterProcedureEnv env, RegionStateNode regionNode)
+ throws IOException, ProcedureSuspendedException {
ServerName loc = regionNode.getRegionLocation();
if (loc == null) {
LOG.warn("No location specified for {}, jump back to state {} to get
one", getRegion(),
RegionStateTransitionState.REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE);
setNextState(RegionStateTransitionState.REGION_STATE_TRANSITION_GET_ASSIGN_CANDIDATE);
return;
}
+ final boolean isMeta = regionNode.getRegionInfo().isMetaRegion();
+ final boolean isMetaAvailable =
!env.getAssignmentManager().isMetaRegionInTransition();
+ if (!isMeta && !isMetaAvailable) {
+ // meta is not assigned yet, so yield
+ throw new ProcedureSuspendedException();
Review comment:
Update the patch to use existing mechanisms in `HMaster` for deciding if
meta is assigned and the server hosting it is online.
> So in general, we should have a way to deal with meta update failure(maybe
just a retry at procedure level?)and have a smaller timeout on updaing meta
operation.
Looks like we deal with meta update failure by aborting the master, at least
as of
[ee3e2b974b](https://github.com/apache/hbase/blob/ee3e2b974ba0653e0c79d45d6e15fd30b4557167/hbase-server/src/main/java/org/apache/hadoop/hbase/master/assignment/RegionStateStore.java#L221):
```
private void updateRegionLocation(RegionInfo regionInfo, State state, Put
put)
throws IOException {
try (Table table =
master.getConnection().getTable(TableName.META_TABLE_NAME)) {
table.put(put);
} catch (IOException e) {
// TODO: Revist!!!! Means that if a server is loaded, then we will
abort our host!
// In tests we abort the Master!
String msg = String.format("FAILED persisting region=%s state=%s",
regionInfo.getShortNameToLog(), state);
LOG.error(msg, e);
master.abort(msg, e);
throw e;
}
}
```
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]