[
https://issues.apache.org/jira/browse/HBASE-14820?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15009859#comment-15009859
]
Heng Chen commented on HBASE-14820:
-----------------------------------
Oh, as region server log shows, when split rollback, split state is at
OFFLINED_PARENT
And the logic in rollback when state at OFFLINED_PARENT is just add parent
region back to online. But parent region has been closed when state as
CLOSED_PARENT_REGION.
Relates code below
{code}
case CLOSED_PARENT_REGION:
try {
// So, this returns a seqid but if we just closed and then reopened,
we
// should be ok. On close, we flushed using sequenceid obtained from
// hosting regionserver so no need to propagate the sequenceid
returned
// out of initialize below up into regionserver as we normally do.
// TODO: Verify.
this.parent.initialize();
} catch (IOException e) {
LOG.error("Failed rollbacking CLOSED_PARENT_REGION of region " +
parent.getRegionInfo().getRegionNameAsString(), e);
throw new RuntimeException(e);
}
break;
.......
case OFFLINED_PARENT:
if (services != null) services.addToOnlineRegions(this.parent);
break;
{code}
IMO before we add parent region back to online, we should do
{{this.parent.initialize()}}. Thoughts?
> Region becomes unavailable after a region split is rolled back
> --------------------------------------------------------------
>
> Key: HBASE-14820
> URL: https://issues.apache.org/jira/browse/HBASE-14820
> Project: HBase
> Issue Type: Bug
> Components: master, regionserver
> Affects Versions: 0.98.15
> Reporter: Clara Xiong
> Attachments: HBASE-14820-RegionServer.log, HBSE-14820-hmaster.log
>
>
> After the region server rolls back a timed out attempt of region split, the
> region becomes unavailable.
> Symptoms:
> The RS displays the region open in the web UI.
> The meta table still points to the RS
> Requests for the regions receive a NotServingRegionException.
> hbck reports 0 inconsistencies.
> Moving the region fails.
> Restarting the region server fixes the problem.
> We have see multiple occurrences which require operation intervention.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)