Esteban Gutierrez created HBASE-12131:
-----------------------------------------
Summary: [hbck] undeployRegions should handle gracefully network
partitions and other exceptions to avoid the same region deployed multiple times
Key: HBASE-12131
URL: https://issues.apache.org/jira/browse/HBASE-12131
Project: HBase
Issue Type: Bug
Components: hbck
Affects Versions: 0.94.23
Reporter: Esteban Gutierrez
Assignee: Esteban Gutierrez
Priority: Critical
If we get an IOE (we currently ignore it) while regions are being undeployed by
hbck we should make sure that we don't re-assign that region in the master
before we know that RS was marked as dead and optionally let the user to
confirm that action or we will end in a split brain situation with clients
talking to different RSs serving the same region.
The offending part is here in HBaseFsck.undeployRegions():
{code}
private void undeployRegions(HbckInfo hi) throws IOException,
InterruptedException {
for (OnlineEntry rse : hi.deployedEntries) {
LOG.debug("Undeploy region " + rse.hri + " from " + rse.hsa);
try {
HBaseFsckRepair.closeRegionSilentlyAndWait(admin, rse.hsa, rse.hri);
offline(rse.hri.getRegionName());
} catch (IOException ioe) {
LOG.warn("Got exception when attempting to offline region "
+ Bytes.toString(rse.hri.getRegionName()), ioe);
}
}
}
{code}
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)