[ 
https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiaoyu Yao updated HDFS-11209:
------------------------------
    Attachment: HDFS-11209.00.patch

Attach a patch to check the NN rolling upgrade status before update the VERSION 
file on SNN and Backup NN. The original code that checks the SNN namesystem 
rollingUpgrade won't work as SNN will never start with RollingUpgrade option. 
Backup NN should have the similar issue. 

Will add a unit test later.

> SNN can't checkpoint when rolling upgrade is not finalized
> ----------------------------------------------------------
>
>                 Key: HDFS-11209
>                 URL: https://issues.apache.org/jira/browse/HDFS-11209
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rolling upgrades
>    Affects Versions: 2.8.0, 3.0.0-alpha1
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>            Priority: Critical
>         Attachments: HDFS-11209.00.patch
>
>
> Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 
> brings this back. 
> With HDFS-8432, the primary NN will not update the VERSION file to the new 
> version after running with "rollingUpgrade" option until upgrade is 
> finalized. This is to support more downgrade use cases.
> However, the checkpoint on the SNN is incorrectly updating the VERSION file 
> when the rollingUpgrade is not finalized yet. As a result, the SNN checkpoint 
> successfully but fail to push it to the primary NN because its version is 
> higher than the primary NN as shown below.
> {code}
> 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode 
> (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  Image uploading failed, status: 403, url: 
> http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le..., 
> message: This namenode has storage info -60:221856466:1444080250181:clusterX 
> but the secondary expected -63:221856466:1444080250181:clusterX
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to