[
https://issues.apache.org/jira/browse/HDFS-6308?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13985319#comment-13985319
]
Binglin Chang commented on HDFS-6308:
-------------------------------------
Related error log:
{code}
2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(197)) - 1418: Call -> /127.0.0.1:58789:
getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service:
"" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId:
"BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds:
1073741826}
2014-04-28 05:18:19,700 TRACE ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(197)) - 1419: Call -> /127.0.0.1:45933:
getHdfsBlockLocations {tokens { identifier: "" password: "" kind: "" service:
"" } tokens { identifier: "" password: "" kind: "" service: "" } blockPoolId:
"BP-1664789652-67.195.138.24-1398662297553" blockIds: 1073741825 blockIds:
1073741826}
2014-04-28 05:18:19,701 TRACE ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(211)) - 1418: Exception <-
localhost/127.0.0.1:58789: getHdfsBlockLocations {java.net.ConnectException:
Call From asf000.sp2.ygridcore.net/67.195.138.24 to localhost:58789 failed on
connection exception: java.net.ConnectException: Connection refused; For more
details see: http://wiki.apache.org/hadoop/ConnectionRefused}
2014-04-28 05:18:19,701 INFO ipc.Server (Server.java:doRead(762)) - Socket
Reader #1 for port 45933: readAndProcess from client 127.0.0.1 threw exception
[java.io.IOException: Connection reset by peer]
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)
at sun.nio.ch.IOUtil.read(IOUtil.java:171)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:243)
at org.apache.hadoop.ipc.Server.channelRead(Server.java:2644)
at org.apache.hadoop.ipc.Server.access$2800(Server.java:133)
at
org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1517)
at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:753)
at
org.apache.hadoop.ipc.Server$Listener$Reader.doRunLoop(Server.java:627)
at org.apache.hadoop.ipc.Server$Listener$Reader.run(Server.java:598)
2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(211)) - 1419: Exception <- /127.0.0.1:45933:
getHdfsBlockLocations {java.net.SocketTimeoutException: Call From
asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket
timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102
remote=/127.0.0.1:45933]; For more details see:
http://wiki.apache.org/hadoop/SocketTimeout}
2014-04-28 05:18:19,702 TRACE ipc.ProtobufRpcEngine
(ProtobufRpcEngine.java:invoke(211)) - 1415: Exception <-
localhost/127.0.0.1:45933: getHdfsBlockLocations
{java.net.SocketTimeoutException: Call From
asf000.sp2.ygridcore.net/67.195.138.24 to localhost:45933 failed on socket
timeout exception: java.net.SocketTimeoutException: 1500 millis timeout while
waiting for channel to be ready for read. ch :
java.nio.channels.SocketChannel[connected local=/127.0.0.1:56102
remote=/127.0.0.1:45933]; For more details see:
{code}
socket read/write timeout is set to 1500ms, timeout error is global(per
connection), so when timeout occurs, all calls in this connection are marked
timeout, but the expected behavior should be: first call timeout, second call
normal.
There is a simple fix, just invoke second call after the connection is closed
for sure.
We can consider improving ipc.Client to prevent this kind of corner case later.
> TestDistributedFileSystem#testGetFileBlockStorageLocationsError is flaky
> ------------------------------------------------------------------------
>
> Key: HDFS-6308
> URL: https://issues.apache.org/jira/browse/HDFS-6308
> Project: Hadoop HDFS
> Issue Type: Bug
> Reporter: Binglin Chang
>
> Found this on pre-commit build of HDFS-6261
> {code}
> java.lang.AssertionError: Expected one valid and one invalid volume
> at org.junit.Assert.fail(Assert.java:88)
> at org.junit.Assert.assertTrue(Assert.java:41)
> at
> org.apache.hadoop.hdfs.TestDistributedFileSystem.testGetFileBlockStorageLocationsError(TestDistributedFileSystem.java:837)
> {code}
--
This message was sent by Atlassian JIRA
(v6.2#6252)