[
https://issues.apache.org/jira/browse/HBASE-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Ted Yu updated HBASE-4452:
--------------------------
Resolution: Fixed
Hadoop Flags: [Reviewed]
Status: Resolved (was: Patch Available)
> Possibility of RS opening a region though tickleOpening fails due to znode
> version mismatch
> -------------------------------------------------------------------------------------------
>
> Key: HBASE-4452
> URL: https://issues.apache.org/jira/browse/HBASE-4452
> Project: HBase
> Issue Type: Bug
> Reporter: ramkrishna.s.vasudevan
> Assignee: ramkrishna.s.vasudevan
> Priority: Critical
> Fix For: 0.90.5
>
> Attachments: 4452.90, HBASE-4452.patch
>
>
> Consider the following code
> {code}
> long period = Math.max(1, assignmentTimeout/ 3);
> long lastUpdate = now;
> while (!signaller.get() && t.isAlive() && !this.server.isStopped() &&
> !this.rsServices.isStopping() && (endTime > now)) {
> long elapsed = now - lastUpdate;
> if (elapsed > period) {
> // Only tickle OPENING if postOpenDeployTasks is taking some time.
> lastUpdate = now;
> tickleOpening("post_open_deploy");
> }
> {code}
> Whenever the postopenDeploy tasks takes considerable time we try to
> tickleOpening so that there is no timeout deducted. But before it could do
> this if the TimeoutMonitor tries to assign the node to another RS then the
> other RS will move the node from OFFLINE to OPENING. Hence when the first RS
> tries to do tickleOpening the operation will fail. Now here lies the problem,
> {code}
> String encodedName = this.regionInfo.getEncodedName();
> try {
> this.version =
> ZKAssign.retransitionNodeOpening(server.getZooKeeper(),
> this.regionInfo, this.server.getServerName(), this.version);
> } catch (KeeperException e) {
> {code}
> Now this.version becomes -1 as the operation failed.
> Now as in the first code snippet as the return type is not captured after
> tickleOpening() fails we go on with moving the node to OPENED. Here again we
> dont have any check for this condition as already the version has been
> changed to -1. Hence the OPENING to OPENED becomes successful. Chances of
> double assignment.
> {noformat}
> 2011-09-22 00:57:29,930 WARN org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x1328ceaa1ff000d Attempt to transition the unassigned
> node for 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to
> RS_ZK_REGION_OPENING failed, the node existed but was version 5 not the
> expected version 2
> 2011-09-22 00:57:33,494 WARN
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Failed
> refreshing OPENING; region=69797d064f773d1aa9adba56e7ff90a3,
> context=post_open_deploy
> 2011-09-22 00:58:02,356 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x1328ceaa1ff000d Attempting to transition node
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:11,853 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign:
> regionserver:60020-0x1328ceaa1ff000d Successfully transitioned node
> 69797d064f773d1aa9adba56e7ff90a3 from RS_ZK_REGION_OPENING to
> RS_ZK_REGION_OPENED
> 2011-09-22 00:58:13,956 DEBUG
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: Opened
> t9,,1316633193606.69797d064f773d1aa9adba56e7ff90a3.
> {noformat}
> Correct me if this analysis is wrong.
--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira