Andre Araujo created HDFS-11654:
-----------------------------------

             Summary: dfs.nameservice.id and dfs.ha.namenode.id are being 
ignored by HDFS clients
                 Key: HDFS-11654
                 URL: https://issues.apache.org/jira/browse/HDFS-11654
             Project: Hadoop HDFS
          Issue Type: Bug
          Components: hdfs-client
    Affects Versions: 2.6.0
            Reporter: Andre Araujo


{{hdfs fsck}} fails with the following stack trace when using a cluster with 
multiple service names configured locally:

{code}
# hdfs fsck /
Exception in thread "main" org.apache.hadoop.HadoopIllegalArgumentException: 
Configuration has multiple addresses that match local node's address. Please 
configure the system with dfs.nameservice.id and dfs.ha.namenode.id
        at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1415)
        at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1435)
        at org.apache.hadoop.hdfs.DFSUtil.getInfoServer(DFSUtil.java:1130)
        at 
org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:248)
        at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:255)
        at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
        at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:148)
        at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:145)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:144)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:360)
{code}

This cluster had originally only one service name, {{nameservice1}}, with the 
following configuration:

{code}
dfs.nameservices = nameservice1
dfs.client.failover.proxy.provider.nameservice1 = 
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.automatic-failover.enabled.nameservice1 = true
dfs.ha.namenodes.nameservice1 = nn001,nn002
dfs.namenode.rpc-address.nameservice1.nn001 = node-1.example.com:8020
dfs.namenode.servicerpc-address.nameservice1.nn001 = node-1.example.com:8022
dfs.namenode.http-address.nameservice1.nn001 = node-1.example.com:20101
dfs.namenode.https-address.nameservice1.nn001 = node-1.example.com:20102
dfs.namenode.rpc-address.nameservice1.nn002 = node-2.example.com:8020
dfs.namenode.servicerpc-address.nameservice1.nn002 = node-2.example.com:8022
dfs.namenode.http-address.nameservice1.nn002 = node-2.example.com:20101
dfs.namenode.https-address.nameservice1.nn002 = node-2.example.com:20102
{code}

And then I added a new service name for the cluster with the following 
configuration:

{code}
dfs.nameservices = nameservice1,newns1
dfs.client.failover.proxy.provider.newns1 = 
org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider
dfs.ha.automatic-failover.enabled.newns1 = true
dfs.ha.namenodes.newns1 = nn101,nn102
dfs.namenode.rpc-address.newns1.nn101 = node-1.example.com:8020
dfs.namenode.servicerpc-address.newns1.nn101 = node-1.example.com:8022
dfs.namenode.http-address.newns1.nn101 = node-1.example.com:20101
dfs.namenode.https-address.newns1.nn101 = node-1.example.com:20102
dfs.namenode.rpc-address.newns1.nn102 = node-2.example.com:8020
dfs.namenode.servicerpc-address.newns1.nn102 = node-2.example.com:8022
dfs.namenode.http-address.newns1.nn102 = node-2.example.com:20101
dfs.namenode.https-address.newns1.nn102 = node-2.example.com:20102
{code}

After that change, users can access the cluster as expected (put, get, etc..) 
using any of these URIs:

* {{/path/to/file}}
* {{hdfs://nameservice1/path/to/file}}
* {{hdfs://ns1/path/to/file}}

{{fsck}}, however, breaks with the error below. The error persists even when 
following the advice in the error message and setting the properties 
{{dfs.nameservice.id}} and {{dfs.ha.namenode.id}}. They don't sort any effect.

{code}
# hdfs fsck /
Exception in thread "main" org.apache.hadoop.HadoopIllegalArgumentException: 
Configuration has multiple addresses that match local node's address. Please 
configure the system with dfs.nameservice.id and dfs.ha.namenode.id
        at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1415)
        at org.apache.hadoop.hdfs.DFSUtil.getSuffixIDs(DFSUtil.java:1435)
        at org.apache.hadoop.hdfs.DFSUtil.getInfoServer(DFSUtil.java:1130)
        at 
org.apache.hadoop.hdfs.tools.DFSck.getCurrentNamenodeAddress(DFSck.java:248)
        at org.apache.hadoop.hdfs.tools.DFSck.doWork(DFSck.java:255)
        at org.apache.hadoop.hdfs.tools.DFSck.access$000(DFSck.java:72)
        at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:148)
        at org.apache.hadoop.hdfs.tools.DFSck$1.run(DFSck.java:145)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:415)
        at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1709)
        at org.apache.hadoop.hdfs.tools.DFSck.run(DFSck.java:144)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84)
        at org.apache.hadoop.hdfs.tools.DFSck.main(DFSck.java:360)
{code}




--
This message was sent by Atlassian JIRA
(v6.3.15#6346)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-dev-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-dev-h...@hadoop.apache.org

Reply via email to