Stephen O'Donnell created HDFS-15919:
----------------------------------------
Summary: BlockPoolManager should log stack trace if unable to get
Namenode addresses
Key: HDFS-15919
URL: https://issues.apache.org/jira/browse/HDFS-15919
Project: Hadoop HDFS
Issue Type: Improvement
Components: datanode
Affects Versions: 3.4.0
Reporter: Stephen O'Donnell
Assignee: Stephen O'Donnell
If the hdfs config is badly configured, the datanode can fail to start with
this stack trace:
{code}
2021-03-24 05:58:27,026 INFO datanode.DataNode
(BlockPoolManager.java:refreshNamenodes(149)) - Refresh request received for
nameservices: null
2021-03-24 05:58:27,033 WARN datanode.DataNode
(BlockPoolManager.java:refreshNamenodes(161)) - Unable to get NameNode
addresses.
...
2021-03-24 05:58:27,077 ERROR datanode.DataNode
(DataNode.java:secureMain(2883)) - Exception in secureMain
java.io.IOException: No services to connect, missing NameNode address.
at
org.apache.hadoop.hdfs.server.datanode.BlockPoolManager.refreshNamenodes(BlockPoolManager.Java:165)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.startDataNode(DataNode.java:1440)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.<init>(DataNode.java:500)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.makeInstance(DataNode.java:2782)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.instantiateDataNode(DataNode.java:2690)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.createDataNode(DataNode.java:2732)
at
org.apache.hadoop.hdfs.server.datanode.DataNode.secureMain(DataNode.java:2876)
at
org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.start(SecureDataNodeStarter.java:100)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.apache.commons.daemon.support.DaemonLoader.start(DaemonLoader.java:243)
{code}
In this case, the issue was an exception thrown in
DFSUtil.getNNServiceRpcAddressesForCluster(...) but there are a couple of
scenarios within it which can cause an exception, so its difficult to figure
out what is wrong with the config.
We should simple add the exception onto the existing log message when an error
occurs so it is clear what caused it.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]