wchevreuil commented on a change in pull request #1071: HBASE-23693 Split
failure may cause region hole and data loss when use zk assign
URL: https://github.com/apache/hbase/pull/1071#discussion_r368946215
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
##########
@@ -783,10 +801,20 @@ public void regionOffline(
LOG.info("Found region in " + state +
" to be reassigned by ServerCrashProcedure for " + sn);
rits.add(hri);
- } else if(state.isSplittingNew() || state.isMergingNew()) {
- LOG.info("Offline/Cleanup region if no meta entry exists, hri: " +
hri +
- " state: " + state);
- regionsToClean.add(state.getRegion());
+ } else if (state.isSplittingNew() || state.isMergingNew()) {
+ LOG.info(
+ "Offline/Cleanup region if no meta entry exists, hri: " + hri +
" state: " + state);
+ if (daughter2Parent.containsKey(hri.getEncodedName())) {
+ HRegionInfo parent = daughter2Parent.get(hri.getEncodedName());
+ HRegionInfo info = getHRIFromMeta(parent);
+ if (info != null && info.isSplit() && info.isOffline()) {
+ regionsToClean.add(Pair.newPair(state.getRegion(), info));
Review comment:
> So if Active Master also crashes before it triggers SCP, the daughter
won`t be deleted.
Yes, that's my point. We would now have a new active master that sees parent
split as complete, although split was midway through. It will potentially
remove the parent, and fail to online the daughters.
I wonder if working on correcting the regions state and split flag updates
would sort split failures at different scenarios. It also does not seem
consistent the way we do these updates in meta only and don't reflect it on the
"in-memory" region info master has.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services