[ https://issues.apache.org/jira/browse/HDFS-12910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16286300#comment-16286300 ]
Stephen O'Donnell commented on HDFS-12910: ------------------------------------------ I would really like the messages that go to System.err in this issue to get into the DN role log, as from a support perspective, users don't tend to go looking for the jsvc.err file and hence cannot find this issue easily when it occurs. However, I am not sure that is feasible here. When jsvc runs the methods in "SecureDataNodeStarter", they are running as root which allows it to bind the ports under 1024. Then when the DN proper starts, the user is switched to hdfs. So while we could use the usual log4j logger for these messages, it means the role log would initially be created as root and then the DN running under HDFS would not be able to write to it. I guess that is why the pattern of writing messages to System.err is already used in SecureDataNodeStarter - to avoid bad ownership on the role log. > Secure Datanode Starter should log the port when it > ---------------------------------------------------- > > Key: HDFS-12910 > URL: https://issues.apache.org/jira/browse/HDFS-12910 > Project: Hadoop HDFS > Issue Type: Improvement > Components: datanode > Affects Versions: 3.1.0 > Reporter: Stephen O'Donnell > Assignee: Stephen O'Donnell > Priority: Minor > Attachments: HDFS-12910.001.patch, HDFS-12910.002.patch > > > When running a secure data node, the default ports it uses are 1004 and 1006. > Sometimes other OS services can start on these ports causing the DN to fail > to start (eg the nfs service can use random ports under 1024). > When this happens an error is logged by jsvc, but it is confusing as it does > not tell you which port it is having issues binding to, for example, when > port 1004 is used by another process: > {code} > Initializing secure datanode resources > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at > org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.getSecureResources(SecureDataNodeStarter.java:105) > at > org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.init(SecureDataNodeStarter.java:71) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:207) > Cannot load daemon > Service exit with a return value of 3 > {code} > And when port 1006 is used: > {code} > Opened streaming server at /0.0.0.0:1004 > java.net.BindException: Address already in use > at sun.nio.ch.Net.bind0(Native Method) > at sun.nio.ch.Net.bind(Net.java:433) > at sun.nio.ch.Net.bind(Net.java:425) > at > sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74) > at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:67) > at > org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.getSecureResources(SecureDataNodeStarter.java:129) > at > org.apache.hadoop.hdfs.server.datanode.SecureDataNodeStarter.init(SecureDataNodeStarter.java:71) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.commons.daemon.support.DaemonLoader.load(DaemonLoader.java:207) > Cannot load daemon > Service exit with a return value of 3 > {code} > We should catch the BindException exception and log out the problem > address:port and then re-throw the exception to make the problem more clear. > I will upload a patch for this. -- This message was sent by Atlassian JIRA (v6.4.14#64029) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org