[jira] [Commented] (HDFS-10986) DFSAdmin should log detailed error message if any

Xiaobing Zhou (JIRA) Tue, 11 Oct 2016 09:41:56 -0700

    [ 
https://issues.apache.org/jira/browse/HDFS-10986?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15565940#comment-15565940
 ]


Xiaobing Zhou commented on HDFS-10986:
--------------------------------------

[~liuml07] thanks for the patch, LGTM. Minor comments:
1. You might want to log it as warning or error msgs.
{code}
234           LOG.info("stderr: " + err);
{code}
2. You may also want to have a sanity check in local build. I received err like
{noformat}
java.lang.Exception: test timed out after 60000 milliseconds
        at sun.nio.ch.KQueueArrayWrapper.kevent0(Native Method)
        at sun.nio.ch.KQueueArrayWrapper.poll(KQueueArrayWrapper.java:198)
        at sun.nio.ch.KQueueSelectorImpl.doSelect(KQueueSelectorImpl.java:117)
        at sun.nio.ch.SelectorImpl.lockAndDoSelect(SelectorImpl.java:86)
        at sun.nio.ch.SelectorImpl.select(SelectorImpl.java:97)
        at 
org.apache.hadoop.net.SocketIOWithTimeout$SelectorPool.select(SocketIOWithTimeout.java:335)
        at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:203)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:531)
        at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:495)
        at 
org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:681)
        at 
org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:777)
        at org.apache.hadoop.ipc.Client$Connection.access$3500(Client.java:408)
        at org.apache.hadoop.ipc.Client.getConnection(Client.java:1542)
        at org.apache.hadoop.ipc.Client.call(Client.java:1373)
        at org.apache.hadoop.ipc.Client.call(Client.java:1337)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:228)
        at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
        at com.sun.proxy.$Proxy26.getDatanodeInfo(Unknown Source)
        at 
org.apache.hadoop.hdfs.protocolPB.ClientDatanodeProtocolTranslatorPB.getDatanodeInfo(ClientDatanodeProtocolTranslatorPB.java:270)
        at 
org.apache.hadoop.hdfs.tools.DFSAdmin.getDatanodeInfo(DFSAdmin.java:2217)
        at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2079)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
        at 
org.apache.hadoop.hdfs.tools.TestDFSAdmin.testDFSAdminUnreachableDatanode(TestDFSAdmin.java:214)
{noformat}

> DFSAdmin should log detailed error message if any
> -------------------------------------------------
>
>                 Key: HDFS-10986
>                 URL: https://issues.apache.org/jira/browse/HDFS-10986
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: tools
>            Reporter: Mingliang Liu
>            Assignee: Mingliang Liu
>         Attachments: HDFS-10986-branch-2.8.002.patch, HDFS-10986.000.patch, 
> HDFS-10986.001.patch, HDFS-10986.002.patch
>
>
> There are some subcommands in {{DFSAdmin}} that swallow IOException and give 
> very limited error message, if any, to the stderr.
> {code}
> $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9866
> Datanode unreachable.
> $ hdfs dfsadmin -getDatanodeInfo localhost:9866
> Datanode unreachable.
> $ hdfs dfsadmin -evictWriters 127.0.0.1:9866
> $ echo $?
> -1
> {code}
> User is not able to get the exception stack even the LOG level is DEBUG. This 
> is not very user friendly. Fortunately, if the port number is not accessible 
> (say 9999), users can infer the detailed error message by IPC logs:
> {code}
> $ hdfs dfsadmin -getBalancerBandwidth 127.0.0.1:9999
> 2016-10-07 18:01:35,115 WARN util.NativeCodeLoader: Unable to load 
> native-hadoop library for your platform... using builtin-java classes where 
> applicable
> 2016-10-07 18:01:36,335 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9999. Already tried 0 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> .....
> 2016-10-07 18:01:45,361 INFO ipc.Client: Retrying connect to server: 
> localhost/127.0.0.1:9999. Already tried 9 time(s); retry policy is 
> RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 
> MILLISECONDS)
> 2016-10-07 18:01:45,362 WARN ipc.Client: Failed to connect to server: 
> localhost/127.0.0.1:9999: retries get failed due to exceeded maximum allowed 
> retries number: 10
> java.net.ConnectException: Connection refused
>       at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
>       at 
> sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
>         ...
>       at org.apache.hadoop.hdfs.tools.DFSAdmin.run(DFSAdmin.java:2073)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
>       at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:90)
>       at org.apache.hadoop.hdfs.tools.DFSAdmin.main(DFSAdmin.java:2225)
> Datanode unreachable.
> {code}
> We should fix this by providing detailed error message. Actually, the 
> {{DFSAdmin#run}} already handles exception carefully, including:
> # set the exit ret value to -1
> # print the error message
> # log the exception stack trace (in DEBUG level)
> All we need to do is to not swallow exceptions without good reason.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Commented] (HDFS-10986) DFSAdmin should log detailed error message if any

Reply via email to