[ 
https://issues.apache.org/jira/browse/HDFS-16508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17508488#comment-17508488
 ] 

tomscut commented on HDFS-16508:
--------------------------------

Hi [~willtoshare] , please see HDFS-15509, HDFS-8277 and HDFS-16505. It seems 
to be the same kind of problem, 

> When the nn1 fails at very beginning, admin command that waits exist safe 
> mode fails
> ------------------------------------------------------------------------------------
>
>                 Key: HDFS-16508
>                 URL: https://issues.apache.org/jira/browse/HDFS-16508
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>    Affects Versions: 3.3.1
>            Reporter: May
>            Priority: Major
>
> The HA is enabled, and we have two NameNodes: nn1 and nn2.
> When starting the cluster, the nn1 fails at the very beginning, and nn2 
> transfers to active state. The culster can provide services normally.
> However, when we tried to get safe mode or wait exit safe mode, our dfsadmin 
> command fails due to an IOException: cannot connect to nn1.
> The *root cause* seems locate in here:
> {code:java}
> //DFSAdmin.class
> public void setSafeMode(String[] argv, int idx) throws IOException {
> …
> if (isHaEnabled) {
>       String nsId = dfsUri.getHost();
>       List<ProxyAndInfo<ClientProtocol>> proxies =
>           HAUtil.getProxiesForAllNameNodesInNameservice(
>           dfsConf, nsId, ClientProtocol.class);
>       for (ProxyAndInfo<ClientProtocol> proxy : proxies) {
>         ClientProtocol haNn = proxy.getProxy();
>         //The code always queries from the first nn, i.e., nn1, and returns 
> with IOException when nn1 fails.
>         boolean inSafeMode = haNn.setSafeMode(action, false);
>         if (waitExitSafe) {
>           inSafeMode = waitExitSafeMode(haNn, inSafeMode);
>         }
>         System.out.println("Safe mode is " + (inSafeMode ? "ON" : "OFF")
>             + " in " + proxy.getAddress());
>       }
>     } 
> …
> }
> {code}
> Actually, I'm curious that do we need to get/wait every namenode here when HA 
> is enabled?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to