[ 
https://issues.apache.org/jira/browse/HDFS-17356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17811219#comment-17811219
 ] 

wangzhihui commented on HDFS-17356:
-----------------------------------

Yes

> RBF: Add Configuration dfs.federation.router.ns.name Optimization
> -----------------------------------------------------------------
>
>                 Key: HDFS-17356
>                 URL: https://issues.apache.org/jira/browse/HDFS-17356
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: dfs, rbf
>            Reporter: wangzhihui
>            Priority: Minor
>         Attachments: screenshot-1.png, screenshot-2.png
>
>
>     When enabling RBF federation in HDFS, when the HDFS server and RBFClient 
> share the same configuration and the HDFS server (NameNode、ZKFC) and 
> RBFClient are on the same node, the following exception occurs, causing 
> NameNode to fail to start; The reason is that the NS of the Router service 
> has been added to the dfs.nameservices list. When NameNode starts, it obtains 
> the NS that the current node belongs to. However, it is found that there are 
> multiple NS that cannot be recognized and cannot pass the verification of 
> existing logic, ultimately resulting in NameNode startup failure. Currently, 
> we can only solve this problem by isolating the hdfs-site.xml of RouterClient 
> and NameNode. However, grouping configuration is not conducive to our unified 
> management of cluster configuration. Therefore, we propose a new solution to 
> solve this problem better.
> {code:java}
> // code placeholder
> 2023-10-30 15:53:24,613 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> registered UNIX signal handlers for [TERM, HUP, INT]
> 2023-10-30 15:53:24,672 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> createNameNode []
> 2023-10-30 15:53:24,760 INFO org.apache.hadoop.metrics2.impl.MetricsConfig: 
> Loaded properties from hadoop-metrics2.properties
> 2023-10-30 15:53:24,842 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Scheduled Metric snapshot 
> period at 10 second(s).
> 2023-10-30 15:53:24,842 INFO 
> org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system 
> started
> 2023-10-30 15:53:24,868 ERROR 
> org.apache.hadoop.hdfs.server.namenode.NameNode: Failed to start namenode.
> org.apache.hadoop.HadoopIllegalArgumentException: Configuration has multiple 
> addresses that match local node's address. Please configure the system with 
> dfs.nameservice.id and dfs.ha.namenode.id
>         at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1257)
>         at org.apache.hadoop.hdfs.DFSUtil.getNameServiceId(DFSUtil.java:1158)
>         at 
> org.apache.hadoop.hdfs.DFSUtil.getNamenodeNameServiceId(DFSUtil.java:1113)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.getNameServiceId(NameNode.java:1822)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1005)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:995)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1769)
>         at 
> org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1834)
> 2023-10-30 15:53:24,870 INFO org.apache.hadoop.util.ExitUtil: Exiting with 
> status 1: org.apache.hadoop.HadoopIllegalArgumentException: Configuration has 
> multiple addresses that match local node's address. Please configure the 
> system with dfs.nameservice.id and dfs.ha.name
> node.id
> 2023-10-30 15:53:24,874 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: 
> SHUTDOWN_MSG: {code}
>  
> hdfs-site.xml
> {code:java}
> // code placeholder
> <property>
>   <name>dfs.nameservices</name>
>   <value>mycluster1,mycluster2,ns-fed</value>
> </property><property>
>   <name>dfs.ha.namenodes.ns-fed</name>
>   <value>r1</value>
> </property>
> <property>
>   <name>dfs.namenode.rpc-address.ns-fed.r1</name>
>   <value>node1.com:8888</value>
> </property>
> <property>
>   <name>dfs.ha.namenodes.mycluster1</name>
>   <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.http-address.mycluster1.nn1</name>
>   <value>node1.com:50070</value>
> </property>
> <property>
>   <name>dfs.namenode.http-address.mycluster1.nn2</name>
>   <value>node2.com:50070</value>
> </property><property>
>   <name>dfs.ha.namenodes.mycluster2</name>
>   <value>nn1,nn2</value>
> </property>
> <property>
>   <name>dfs.namenode.http-address.mycluster2.nn1</name>
>   <value>node3.com:50070</value>
> </property>
> <property>
>   <name>dfs.namenode.http-address.mycluster2.nn2</name>
>   <value>node4.com:50070</value>
> </property><property>
>   <name>dfs.client.failover.proxy.provider.ns-fed</name>
>   
> <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
> </property>
> <property>
>   <name>dfs.client.failover.random.order</name>
>   <value>true</value>
> </property> {code}
>  
> Solution
> Add dfs.federation.router.ns.name configuration in hdfs-site.xml to mark the 
> Router NS name. and filter out Router NS during NameNode or ZKFC startup to 
> avoid this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

Reply via email to