wchevreuil commented on a change in pull request #1071: HBASE-23693 Split
failure may cause region hole and data loss when use zk assign
URL: https://github.com/apache/hbase/pull/1071#discussion_r368948523
##########
File path:
hbase-server/src/main/java/org/apache/hadoop/hbase/master/RegionStates.java
##########
@@ -758,8 +764,20 @@ public void regionOffline(
// Delete the ZNode if exists
ZKAssign.deleteNodeFailSilent(watcher, region);
regionsToOffline.add(region);
+ PairOfSameType<HRegionInfo> daughterRegions =
+
MetaTableAccessor.getDaughterRegionsFromParent(this.server.getConnection(),
region);
+ if (daughterRegions != null) {
+ if (daughterRegions.getFirst() != null) {
+
daughter2Parent.put(daughterRegions.getFirst().getEncodedName(), region);
+ }
+ if (daughterRegions.getSecond() != null) {
+
daughter2Parent.put(daughterRegions.getSecond().getEncodedName(), region);
+ }
+ }
} catch (KeeperException ke) {
server.abort("Unexpected ZK exception deleting node " + region,
ke);
+ } catch (IOException e) {
+ LOG.warn("get daughter from meta exception " + region, e);
Review comment:
> If we update the meta information later, we can only put it after the
completion of execute openDaughters.
Yeah, that's what I think.
> It is not very clear to me what impact this might have now. I may need
more detailed and comprehensive thinking. My idea is to merge this patch first,
at least it can solve most of the problems. Do you think it is okay?
If you think it's too much work try fixing the state/split flag updates on
this PR, then yeah, we can merge this one for now, then work on the other
solution in a different jira/RP.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services