[ 
https://issues.apache.org/jira/browse/HDFS-16565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

JiangHua Zhu updated HDFS-16565:
--------------------------------
    Description: 
When DataTransfer runs, the local node needs to connect to another DataNode, 
which is through socket. Once the connection fails, a NoRouteToHostException 
will be generated.
Exception information:
2022-04-29 15:47:47,931 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(xxxx.xxxx.xxxx.xxxx:1004, 
datanodeUuid=xxxx.xxxx.xxxx.xxxx, infoPort=1006 , infoSecurePort=0, 
ipcPort=8025, 
storageInfo=lv=-57;cid=xxxx.xxxx.xxxx.xxxx;nsid=961284063;c=1589290804417):Failed
 to transfer BP-1375239094-xxxx.xxxx.xxxx.xxxx- 
1589290804417:blk_-9223372035798255743_66037710 to xxxx.xxxx.xxx.xxxx:1004 got
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:497)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2562)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

The source of the accident:
sock = newSocket();
        NetUtils.connect(sock, curTarget, dnConf.socketTimeout); 
sock.setTcpNoDelay(dnConf.getDataTransferServerTcpNoDelay());
        sock.setSoTimeout(targets.length * dnConf.socketTimeout);

When a NoRouteToHostException occurs, the Block will be added to the 
VolumeScanner, and the VolumeScanner will start working to scan the Block. This 
should not happen because this is not a real IOException.


  was:
When DataTransfer runs, the local node needs to connect to another DataNode, 
which is through socket. Once the connection fails, a NoRouteToHostException 
will be generated.
Exception information:
2022-04-29 15:47:47,931 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(xxxx.xxxx.xxxx.xxxx:1004, 
datanodeUuid=xxxx.xxxx.xxxx.xxxx, infoPort=1006 , infoSecurePort=0, 
ipcPort=8025, 
storageInfo=lv=-57;cid=xxxx.xxxx.xxxx.xxxx;nsid=961284063;c=1589290804417):Failed
 to transfer BP-1375239094-xxxx.xxxx.xxxx.xxxx- 
1589290804417:blk_-9223372035798255743_66037710 to xxxx.xxxx.xxx.xxxx:1004 got
java.net.NoRouteToHostException: No route to host
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
at 
org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:497)
at 
org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2562)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

The source of the accident:
sock = newSocket();
        NetUtils.connect(sock, curTarget, dnConf.socketTimeout); 
sock.setTcpNoDelay(dnConf.getDataTransferServerTcpNoDelay());
        sock.setSoTimeout(targets.length * dnConf.socketTimeout);

When a NoRouteToHostException occurs, the Block will be added to the 
VolumeScanner, and the VolumeScanner will start working to scan the Block. This 
should not happen because this is not a real IOException.
catch (IOException ie) {
        handleBadBlock(b, ie, false);
        LOG.warn("{}:Failed to transfer {} to {} got",
            bpReg, b, targets[0], ie);
      }



> Optimize DataNode#DataTransfer, when encountering NoRouteToHostException
> ------------------------------------------------------------------------
>
>                 Key: HDFS-16565
>                 URL: https://issues.apache.org/jira/browse/HDFS-16565
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: datanode
>    Affects Versions: 3.3.0
>            Reporter: JiangHua Zhu
>            Assignee: JiangHua Zhu
>            Priority: Major
>
> When DataTransfer runs, the local node needs to connect to another DataNode, 
> which is through socket. Once the connection fails, a NoRouteToHostException 
> will be generated.
> Exception information:
> 2022-04-29 15:47:47,931 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> DatanodeRegistration(xxxx.xxxx.xxxx.xxxx:1004, 
> datanodeUuid=xxxx.xxxx.xxxx.xxxx, infoPort=1006 , infoSecurePort=0, 
> ipcPort=8025, 
> storageInfo=lv=-57;cid=xxxx.xxxx.xxxx.xxxx;nsid=961284063;c=1589290804417):Failed
>  to transfer BP-1375239094-xxxx.xxxx.xxxx.xxxx- 
> 1589290804417:blk_-9223372035798255743_66037710 to xxxx.xxxx.xxx.xxxx:1004 got
> java.net.NoRouteToHostException: No route to host
> at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
> at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
> at 
> org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
> at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:497)
> at 
> org.apache.hadoop.hdfs.server.datanode.DataNode$DataTransfer.run(DataNode.java:2562)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> The source of the accident:
> sock = newSocket();
>         NetUtils.connect(sock, curTarget, dnConf.socketTimeout); 
> sock.setTcpNoDelay(dnConf.getDataTransferServerTcpNoDelay());
>         sock.setSoTimeout(targets.length * dnConf.socketTimeout);
> When a NoRouteToHostException occurs, the Block will be added to the 
> VolumeScanner, and the VolumeScanner will start working to scan the Block. 
> This should not happen because this is not a real IOException.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to