[ 
https://issues.apache.org/jira/browse/HDFS-6137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14136517#comment-14136517
 ] 

Tsz Wo Nicholas Sze commented on HDFS-6137:
-------------------------------------------

> I think this is fixed in trunk now after HDFS-6800 and HDFS-6981.

This problem probably is different than HDFS-6800 and HDFS-6981.

Let me give some background.  Federation added block pools to the datanode data 
directory.  The directory structure becomes

{noformat}
data +- current  +- pool_1 +- current
     |           |         +- previous
     |           |        
     |           +- pool_2 +- current
     |                     +- previous
     |
     +- previous
{noformat}

Then, we have two level VERSION files, data/current/VERSION and 
data/current/poo1_x/current/VERSION.  During upgrade, both VERSION files are 
overwritten to the new versions.  For rollback, since we may only rollback an 
individual block pool, only data/current/poo1_x/current/VERSION is restored but 
not data/current/VERSION.  Then, we will get version mismatched.

We found the problem in HDFS-5526.  At that time we added code to overwrite the 
data/current/VERSION file during rollback.  It worked fine.

However, for the software versions with Federation but without HDFS-5526, they 
still have the problem so that they cannot rollback.  This is the bug described 
here.

I think we only can advise users to do manually rollback (manually change the 
data/current/VERSION file to the old version) but cannot change the (old) 
softwares to fix bug.


> Datanode cannot rollback because LayoutVersion incorrect
> --------------------------------------------------------
>
>                 Key: HDFS-6137
>                 URL: https://issues.apache.org/jira/browse/HDFS-6137
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: datanode
>    Affects Versions: 2.0.5-alpha
>            Reporter: Fengdong Yu
>
> upgrade from hadoop-2.0.5-alpha(QJM HA enabled) to the lastest trunk(HA 
> disabled), which is successful. then stop the cluster, and rollback,  then it 
> throw exception:
> {code}
> 2014-03-21 18:33:19,384 FATAL 
> org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for 
> block pool Block pool BP-1123524590-10.204.8.135-1395397158134 (storage id 
> DS-1123524590-10.204.8.135-50010-1395397185148) service to 
> 10-204-8-135/10.204.8.135:9000
> org.apache.hadoop.hdfs.server.common.IncorrectVersionException: Unexpected 
> version of storage directory 
> /data/hdfs/data/current/BP-1123524590-10.204.8.135-1395397158134. Reported: 
> -55. Expecting = -40.
>         at 
> org.apache.hadoop.hdfs.server.common.Storage.setLayoutVersion(Storage.java:1083)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.setFieldsFromProperties(BlockPoolSliceStorage.java:217)
>         at 
> org.apache.hadoop.hdfs.server.common.Storage.readProperties(Storage.java:922)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.doTransition(BlockPoolSliceStorage.java:244)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceStorage.recoverTransitionRead(BlockPoolSliceStorage.java:145)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataStorage.recoverTransitionRead(DataStorage.java:234)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initStorage(DataNode.java:913)
>         at 
> org.apache.hadoop.hdfs.server.datanode.DataNode.initBlockPool(DataNode.java:884)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPOfferService.verifyAndSetNamespaceInfo(BPOfferService.java:280)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.connectToNNAndHandshake(BPServiceActor.java:222)
>         at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:664)
>         at java.lang.Thread.run(Thread.java:744)
> {code}
>   
> I looked at the datanode dir,  $datanode.dir/VERSION is always new, when we 
> upgrade, this file was overwrited, so it MUST fail during rollback.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to