[
https://issues.apache.org/jira/browse/HDFS-7185?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14170442#comment-14170442
]
Tsz Wo Nicholas Sze commented on HDFS-7185:
-------------------------------------------
- I think writing lv in FSImage.recoverStorageDirs(..) may be too early since
it is checking each storage directory and some of them may fail. Then we may
end up with some directories updated but not all of them. How about we change
the version check and defer writing lv until the code below?
{code}
+ if (fsImage.getLayoutVersion() != HdfsConstants.NAMENODE_LAYOUT_VERSION
+ && StartupOption.ROLLINGUPGRADE == startOpt) {
+ fsImage.updateStorageVersion();
+ }
{code}
- I think the above code should be more strict:
-* for downgrade, we must have the same versions; otherwise, throw an exception.
-* for started, update lv only if current version is newer than the on-disk
version. If current version is older than the on-disk version, throw an
exception.
-* for rollback, update lv only if current version is older than the on-disk
version. If current version is newer than the on-disk version, throw an
exception.
> The active NameNode will not accept an fsimage sent from the standby during
> rolling upgrade
> -------------------------------------------------------------------------------------------
>
> Key: HDFS-7185
> URL: https://issues.apache.org/jira/browse/HDFS-7185
> Project: Hadoop HDFS
> Issue Type: Bug
> Affects Versions: 2.4.0
> Reporter: Colin Patrick McCabe
> Assignee: Jing Zhao
> Attachments: HDFS-7185.000.patch, HDFS-7185.001.patch
>
>
> The active NameNode will not accept an fsimage sent from the standby during
> rolling upgrade. The active fails with the exception:
> {code}
> 18:25:07,620 WARN ImageServlet:198 - Received an invalid request file
> transfer request from a secondary with storage info
> -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
> 18:25:07,620 WARN log:76 - Committed before 410 PutImage failed.
> java.io.IOException: This namenode has storage info
> -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary
> expected -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-
> 0a6e431987f6
> at
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.validateRequest(ImageServlet.java:200)
> at
> org.apache.hadoop.hdfs.server.namenode.ImageServlet.doPut(ImageServlet.java:443)
> at javax.servlet.http.HttpServlet.service(HttpServlet.java:730)
> {code}
> On the standby, the exception is:
> {code}
> java.io.IOException: Exception during image upload:
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpPutFailedException:
> This namenode has storage info
> -55:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6 but the secondary
> expected
> -59:65195028:0:CID-385de4d7-64e4-4dde-9f5d-0a6e431987f6
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.doCheckpoint(StandbyCheckpointer.java:218)
> at
> org.apache.hadoop.hdfs.server.namenode.ha.StandbyCheckpointer.access$1400(StandbyCheckpointer.java:62)
> {code}
> This seems to be a consequence of the fact that the VERSION file still is at
> -55 (the old version) even after the rolling upgrade has started. When the
> rolling upgrade is finalized with {{hdfs dfsadmin -rollingUpgrade finalize}},
> both VERSION files get set to the new version, and the problem goes away.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)