[
https://issues.apache.org/jira/browse/HDFS-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136517#comment-14136517
]
Tsz Wo Nicholas Sze commented on HDFS-6137:
-------------------------------------------
> I think this is fixed in trunk now after HDFS-6800 and HDFS-6981.
This problem probably is different than HDFS-6800 and HDFS-6981.
Let me give some background. Federation added block pools to the datanode data
directory. The directory structure becomes
{noformat}
data +- current +- pool_1 +- current
| | +- previous
| |
| +- pool_2 +- current
| +- previous
|
+- previous
{noformat}
Then, we have two level VERSION files, data/current/VERSION and
data/current/poo1_x/current/VERSION. During upgrade, both VERSION files are
overwritten to the new versions. For rollback, since we may only rollback an
individual block pool, only data/current/poo1_x/current/VERSION is restored but
not data/current/VERSION. Then, we will get version mismatched.
We found the problem in HDFS-5526. At that time we added code to overwrite the
data/current/VERSION file during rollback. It worked fine.
However, for the software versions with Federation but without HDFS-5526, they
still have the problem so that they cannot rollback. This is the bug described
here.
I think we only can advise users to do manually rollback (manually change the
data/current/VERSION file to the old version) but cannot change the (old)
softwares to fix bug.
> Datanode cannot rollback because LayoutVersion incorrect
> --------------------------------------------------------
>
> Key: HDFS-6137
> URL: https://issues.apache.org/jira/browse/HDFS-6137
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: datanode
> Affects Versions: 2.0.5-alpha
> Reporter: Fengdong Yu
>
> upgrade from hadoop-2.0.5-alpha(QJM HA enabled) to the lastest trunk(HA
> disabled), which is successful. then stop the cluster, and rollback, then it
> throw exception:
> {code}
> 2014-03-21 18:33:19,384 FATAL
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for
> block pool Block pool BP-1123524590-10.204.8.135-1395397158134 (storage id
> DS-1123524590-10.204.8.135-50010-1395397185148) service to
> 10-204-8-135/10.204.8.135:9000
> org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected
> version of storage directory
> /data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported:
> -55. Expecting = -40.
> at
> org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
> at
> org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
> at
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
> at
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
> at
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
> at
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
> at
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
> at java.lang.Thread.run(Thread.java:744)
> {code}
>
> I looked at the datanode dir, $datanode.dir/VERSION is always new, when we
> upgrade, this file was overwrited, so it MUST fail during rollback.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)