[ https://issues.apache.org/jira/browse/HDFS-17178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Íñigo Goiri resolved HDFS-17178. -------------------------------- Hadoop Flags: Reviewed Resolution: Fixed > BootstrapStandby needs to handle RollingUpgrade > ------------------------------------------------ > > Key: HDFS-17178 > URL: https://issues.apache.org/jira/browse/HDFS-17178 > Project: Hadoop HDFS > Issue Type: Improvement > Components: namenode > Reporter: Danny Becker > Assignee: Danny Becker > Priority: Minor > Labels: pull-request-available > Fix For: 3.4.0 > > > During rollingUpgrade, bootstrapStandby will fail with an exception due to > different NameNodeLayoutVersions. We can ignore this safely during > RollingUpgrade because different NameNodeLayoutVersions are expected. > * NameNodes will not be able to recover with BootstrapStandby if they go > through destructive repair before the rollingUpgrade has been finalized. > Error during BootstrapStandby before change: > {code:java} > ===================================================== > About to bootstrap Standby ID nn2 from: > Nameservice ID: MTPrime-MWHE01-0 > Other Namenode ID: nn1 > Other NN's HTTP address: https://MWHEEEAP002D9A2:81 > Other NN's IPC address: MWHEEEAP002D9A2.ap.gbl/10.59.208.18:8020 > Namespace ID: 895912530 > Block pool ID: BP-1556042256-10.99.154.61-1663325602669 > Cluster ID: MWHE01 > Layout version: -64 > isUpgradeFinalized: true > ===================================================== > 2023-08-28T19:35:06,940 ERROR [main] namenode.NameNode: Failed to start > namenode. > java.io.IOException: java.lang.RuntimeException: > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: > Image transfer servlet at > https://MWHEEEAP002D9A2:81/imagetransfer?getimage=1&txid=25683470&storageInfo=-64:895912530:1663325602669:MWHE01&bootstrapstandby=true > failed with status code 403 > Response message: > This namenode has storage info -63:895912530:1663325602669:MWHE01 but the > secondary expected -64:895912530:1663325602669:MWHE01 > at > org.apache.hadoop.hdfs.server.namenode.ha.BootstrapStandby.run(BootstrapStandby.java:583) > ~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?] > at > org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1717) > ~[hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?] > at > org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1819) > [hadoop-hdfs-2.9.2-MT-SNAPSHOT.jar:?] > Caused by: java.lang.RuntimeException: > org.apache.hadoop.hdfs.server.namenode.TransferFsImage$HttpGetFailedException: > Image transfer servlet at https://MWHEEEAP002D9A2:81{code} > This is caused because the namespaceInfo sent from the proxy node does not > include the effective layout version, which causes BootstrapStandby to send a > request with a storageinfo param using the service layout version. This > causes the proxy node to refuse the request, because it compares the > storageinfo param against its storage info, which uses the effective layout > version, not the service layout version. > To fix this we can modify the proxy.versionRequest() call stack to set the > layout version using the effective layout version on the proxy node. We can > then add logic to BootstrapStandby to properly handle the case where the > proxy node is in rolling upgrade. -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org