[
https://issues.apache.org/jira/browse/HDFS-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jing Zhao updated HDFS-7185:
----------------------------
Attachment: HDFS-7185.004.patch
After an offline discussion with Nicholas, this 004 patch adds more restrict
check for rollingUpgrade rollback. Specifically, we check if the software's
layout version is the same with the fsimage's layout version if we're doing
rolling rollback.
> The active NameNode will not accept an fsimage sent from the standby during
> rolling upgrade
> -------------------------------------------------------------------------------------------
>
> Key: HDFS-7185
> URL: https://issues.apache.org/jira/browse/HDFS-7185
> Project: Hadoop HDFS
> Issue Type: Bug
> Components: namenode
> Affects Versions: 2.4.0
> Reporter: Colin Patrick McCabe
> Assignee: Jing Zhao
> Attachments: HDFS-7185.000.patch, HDFS-7185.001.patch,
> HDFS-7185.002.patch, HDFS-7185.003.patch, HDFS-7185.004.patch
>
>
> The active NameNode will not accept an fsimage sent from the standby during
> rolling upgrade. The active fails with the exception:
> {code}
> 18:25:07,620 WARN ImageServlet:198 - Received an invalid request file
> transfer request from a secondary with storage info
> -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
> 18:25:07,620 WARN log:76 - Committed before 410 PutImage failed.
> java.io.IOException: This namenode has storage info
> -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary
> expected -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-
> 0a6e431987f6
> at
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.validateRequest(ImageServlet.java:200)
> at
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:443)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:730)
> {code}
> On the standby, the exception is:
> {code}
> java.io.IOException: Exception during image upload:
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
> This namenode has storage info
> -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary
> expected
> -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:218)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
> {code}
> This seems to be a consequence of the fact that the VERSION file still is at
> -55 (the old version) even after the rolling upgrade has started. When the
> rolling upgrade is finalized with {{hdfs dfsadmin -rollingUpgrade finalize}},
> both VERSION files get set to the new version, and the problem goes away.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)