[
https://issues.apache.org/jira/browse/HBASE-4880?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13165044#comment-13165044
]
stack commented on HBASE-4880:
------------------------------
@Chunhui Nice patch. If you are going to make a new version to address Ram
comment, here are some other comments to consider:
I think your patch can be smaller. Much of your patch is trying to explain
that postopendeploy now no longer onlines a region as part of its opening. I
think you do not have to explain this change in multiple places in the code;
just change the code to no longer do it. For example, this part of diff looks
unneeded to me:
{code}
@@ -110,10 +109,14 @@
tryTransitionToFailedOpen(regionInfo);
return;
}
-
boolean failed = true;
if (tickleOpening("post_region_open")) {
- if (updateMeta(region)) failed = false;
+ // Just update .META. table ,but not add it to online regions.
+ // We shouldn't serve this region before master receive region opened
+ // message.(see HBASE-4880)
+ if (updateMeta(region)) {
+ failed = false;
+ }
}
if (failed || this.server.isStopped() ||
this.rsServices.isStopping()) {
{code}
Its just comments but I think that to a reader who comes along later, they
won't be able to make much sense of what the comment is about -- not without
hbase-4880 (in other words, the comment will have a short shelf-life; it makes
sense to the few of us who have been looking at this bit of code but soon we'll
all forget that postopendeploy used online the region too.
Similar with this change:
{code}
@@ -212,9 +217,11 @@
}
/**
- * Thread to run region post open tasks. Call {@link #getException()} after
- * the thread finishes to check for exceptions running
- * {@link RegionServerServices#postOpenDeployTasks(HRegion,
org.apache.hadoop.hbase.catalog.CatalogTracker, boolean)}.
+ * Thread to run region post open tasks, which would update meta, but would
+ * not add region to OnlineRegions . Call {@link #getException()} after the
+ * thread finishes to check for exceptions running
+ * {@link RegionServerServices#postOpenDeployTasks(HRegion,
org.apache.hadoop.hbase.catalog.CatalogTracker, boolean)}
+ * .
*/
static class PostOpenDeployTasksThread extends Thread {
private Exception exception = null;
{code}
and the change to the class RegionServerServices.
Otherwise, patch is great. +1. Would be nice to have a test for it but its a
bit too ornery of a context to conjure in tests so I'd be fine w/ it going into
trunk and 0.92.
> Region is on service before openRegionHandler completes, may cause data loss
> ----------------------------------------------------------------------------
>
> Key: HBASE-4880
> URL: https://issues.apache.org/jira/browse/HBASE-4880
> Project: HBase
> Issue Type: Bug
> Affects Versions: 0.92.0, 0.94.0
> Reporter: chunhui shen
> Assignee: chunhui shen
> Attachments: 4880.txt, hbase-4880.patch, hbase-4880v2.patch,
> hbase-4880v3.patch
>
>
> OpenRegionHandler in regionserver is processed as the following steps:
> {code}
> 1.openregion()(Through it, closed = false, closing = false)
> 2.addToOnlineRegions(region)
> 3.update .meta. table
> 4.update ZK's node state to RS_ZK_REGION_OPEND
> {code}
> We can find that region is on service before Step 4.
> It means client could put data to this region after step 3.
> What will happen if step 4 is failed processing?
> It will execute OpenRegionHandler#cleanupFailedOpen which will do closing
> region, and master assign this region to another regionserver.
> If closing region is failed, the data which is put between step 3 and step 4
> may loss, because the region has been opend on another regionserver and be
> put new data. Therefore, it may not be recoverd through replayRecoveredEdit()
> because the edit's LogSeqId is smaller than current region SeqId.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira