[ 
https://issues.apache.org/jira/browse/HDFS-11209?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15826619#comment-15826619
 ] 

Arpit Agarwal commented on HDFS-11209:
--------------------------------------

+1 for the v04 patch assuming the test failures are unrelated.

One minor point - the isRollingUpgrade RPC need not check for super user 
privilege since it's harmless. Doesn't affect correctness though since the SNN 
would be running as the hdfs superuser.

> SNN can't checkpoint when rolling upgrade is not finalized
> ----------------------------------------------------------
>
>                 Key: HDFS-11209
>                 URL: https://issues.apache.org/jira/browse/HDFS-11209
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: rolling upgrades
>    Affects Versions: 2.8.0, 3.0.0-alpha1
>            Reporter: Xiaoyu Yao
>            Assignee: Xiaoyu Yao
>            Priority: Critical
>         Attachments: HDFS-11209.00.patch, HDFS-11209.01.patch, 
> HDFS-11209.02.patch, HDFS-11209.03.patch, HDFS-11209.04.patch
>
>
> Similar problem has been fixed with HDFS-7185. Recent change in HDFS-8432 
> brings this back. 
> With HDFS-8432, the primary NN will not update the VERSION file to the new 
> version after running with "rollingUpgrade" option until upgrade is 
> finalized. This is to support more downgrade use cases.
> However, the checkpoint on the SNN is incorrectly updating the VERSION file 
> when the rollingUpgrade is not finalized yet on the primary NN. As a result, 
> the SNN checkpoint successfully but fail to push it to the primary NN because 
> its version is higher than the primary NN as shown below.
> {code}
> 2016-12-02 05:25:31,918 ERROR namenode.SecondaryNameNode 
> (SecondaryNameNode.java:doWork(399)) - Exception in doCheckpoint
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
>  Image uploading failed, status: 403, url: 
> http://NN:50070/imagetransfer?txid=345404754&imageFile=IMAGE&File-Le..., 
> message: This namenode has storage info -60:221856466:1444080250181:clusterX 
> but the secondary expected -63:221856466:1444080250181:clusterX
> {code} 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to