[
https://issues.apache.org/jira/browse/HDFS-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17764406#comment-17764406
]
ASF GitHub Bot commented on HDFS-17178:
---------------------------------------
goiri merged PR #6018:
URL: https://github.com/apache/hadoop/pull/6018
> BootstrapStandby needs to handle RollingUpgrade
> ------------------------------------------------
>
> Key: HDFS-17178
> URL: https://issues.apache.org/jira/browse/HDFS-17178
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: namenode
> Reporter: Danny Becker
> Assignee: Danny Becker
> Priority: Minor
> Fix For: 3.4.0
>
>
> During rollingUpgrade, bootstrapStandby will fail with an exception due to
> different NameNodeLayoutVersions. We can ignore this safely during
> RollingUpgrade because different NameNodeLayoutVersions are expected.
> * NameNodes will not be able to recover with BootstrapStandby if they go
> through destructive repair before the rollingUpgrade has been finalized.
> Error during BootstrapStandby before change:
> {code:java}
> =====================================================
> About to bootstrap Standby ID nn2 from:
> Nameservice ID: MTPrime-MWHE01-0
> Other Namenode ID: nn1
> Other NN's HTTP address: https://MWHEEEAP002D9A2:81
> Other NN's IPC address: MWHEEEAP002D9A2.ap.gbl/10.59.208.18:8020
> Namespace ID: 895912530
> Block pool ID: BP-1556042256-10.99.154.61-1663325602669
> Cluster ID: MWHE01
> Layout version: -64
> isUpgradeFinalized: true
> =====================================================
> 2023-08-28T19:35:06,940 ERROR [main] namenode.NameNode: Failed to start
> namenode.
> java.io.IOException: java.lang.RuntimeException:
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException:
> Image transfer servlet at
> https://MWHEEEAP002D9A2:81/imagetransfer?getimage=1&txid=25683470&storageInfo=-64:895912530:1663325602669:MWHE01&bootstrapstandby=true
> failed with status code 403
> Response message:
> This namenode has storage info -63:895912530:1663325602669:MWHE01 but the
> secondary expected -64:895912530:1663325602669:MWHE01
> at
> org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:583)
> ~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1717)
> ~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
> at
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1819)
> [hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?]
> Caused by: java.lang.RuntimeException:
> org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException:
> Image transfer servlet at https://MWHEEEAP002D9A2:81{code}
> This is caused because the namespaceInfo sent from the proxy node does not
> include the effective layout version, which causes BootstrapStandby to send a
> request with a storageinfo param using the service layout version. This
> causes the proxy node to refuse the request, because it compares the
> storageinfo param against its storage info, which uses the effective layout
> version, not the service layout version.
> To fix this we can modify the proxy.versionRequest() call stack to set the
> layout version using the effective layout version on the proxy node. We can
> then add logic to BootstrapStandby to properly handle the case where the
> proxy node is in rolling upgrade.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]