[ https://issues.apache.org/jira/browse/HDFS-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15565940#comment-15565940 ]
Xiaobing Zhou commented on HDFS-10986: -------------------------------------- [~liuml07] thanks for the patch, LGTM. Minor comments: 1. You might want to log it as warning or error msgs. {code} 234 LOG.info("stderr: " + err); {code} 2. You may also want to have a sanity check in local build. I received err like {noformat} java.lang.Exception: test timed out after 60000 milliseconds at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method) at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198) at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117) at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86) at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97) at org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335) at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531) at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495) at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:681) at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:777) at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:408) at org.apache.hadoop.ipc.Client.getConnection(Client.java:1542) at org.apache.hadoop.ipc.Client.call(Client.java:1373) at org.apache.hadoop.ipc.Client.call(Client.java:1337) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228) at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116) at com.sun.proxy.$Proxy26.getDatanodeInfo(Unknown Source) at org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getDatanodeInfo(ClientDatanodeProtocolTranslatorPB.java:270) at org.apache.hadoop.hdfs.tools.DFSAdmin.getDatanodeInfo(DFSAdmin.java:2217) at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2079) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) at org.apache.hadoop.hdfs.tools.TestDFSAdmin.testDFSAdminUnreachableDatanode(TestDFSAdmin.java:214) {noformat} > DFSAdmin should log detailed error message if any > ------------------------------------------------- > > Key: HDFS-10986 > URL: https://issues.apache.org/jira/browse/HDFS-10986 > Project: Hadoop HDFS > Issue Type: Bug > Components: tools > Reporter: Mingliang Liu > Assignee: Mingliang Liu > Attachments: HDFS-10986-branch-2.8.002.patch, HDFS-10986.000.patch, > HDFS-10986.001.patch, HDFS-10986.002.patch > > > There are some subcommands in {{DFSAdmin}} that swallow IOException and give > very limited error message, if any, to the stderr. > {code} > $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866 > Datanode unreachable. > $ hdfs dfsadmin -getDatanodeInfo localhost:9866 > Datanode unreachable. > $ hdfs dfsadmin -evictWriters 127.0.0.1:9866 > $ echo $? > -1 > {code} > User is not able to get the exception stack even the LOG level is DEBUG. This > is not very user friendly. Fortunately, if the port number is not accessible > (say 9999), users can infer the detailed error message by IPC logs: > {code} > $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999 > 2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load > native-hadoop library for your platform... using builtin-java classes where > applicable > 2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9999. Already tried 0 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > ..... > 2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: > localhost/127.0.0.1:9999. Already tried 9 time(s); retry policy is > RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 > MILLISECONDS) > 2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: > localhost/127.0.0.1:9999: retries get failed due to exceeded maximum allowed > retries number: 10 > java.net.ConnectException: Connection refused > at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method) > at > sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) > ... > at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76) > at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90) > at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225) > Datanode unreachable. > {code} > We should fix this by providing detailed error message. Actually, the > {{DFSAdmin#run}} already handles exception carefully, including: > # set the exit ret value to -1 > # print the error message > # log the exception stack trace (in DEBUG level) > All we need to do is to not swallow exceptions without good reason. -- This message was sent by Atlassian JIRA (v6.3.4#6332) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org