[ https://issues.apache.org/jira/browse/HDFS-17357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17830320#comment-17830320 ]
ASF GitHub Bot commented on HDFS-17357: --------------------------------------- LiuGuH commented on PR #6502: URL: https://github.com/apache/hadoop/pull/6502#issuecomment-2017106408 Thanks for review. @zhangshuyan0 The solution may be happened with datanodes that have heavy load IO. But the unit test case I can not reproduce. It has no relation with DFS_DATANODE_MAX_RECEIVER_THREADS_KEY. With datanodes that have heavy load IO, in.close() and out.close() may be also throw IOException when close() is invoked and the socket may be not really closed. > NioInetPeer.close() should close socket connection. > --------------------------------------------------- > > Key: HDFS-17357 > URL: https://issues.apache.org/jira/browse/HDFS-17357 > Project: Hadoop HDFS > Issue Type: Bug > Reporter: liuguanghua > Assignee: liuguanghua > Priority: Major > Labels: pull-request-available > > NioInetPeer.close() now do not close socket connection. > And I found 3w+ connections leakage in datanode . And I found many warn > message as blew. > 2024-01-22 15:27:57,500 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: > hostname:50010:DataXceiverServer > > When any Exception is found in DataXceiverServer, it will execute clostStream. > IOUtils.closeStream(peer) -> Peer.close() -> NioInetPeer.close() > But NioInetPeer.close() is not invoked with close socket connection. And > this will lead to connection leakage. > Other subClass of Peer's close() is implemented with socket.close(). See > EncryptedPeer, DomainPeer, BasicInetPeer > > > This solution can be reporduced as following: > (1) Client write data to HDFS > (2) datanode Xceiver count max to DFS_DATANODE_MAX_RECEIVER_THREADS_KEY , the > new Xceiver will fail and throw IOException . And the socket will not release. > (3) Client crash for that no new data will be added or client.close is > executed. > (4) There will be socket connection leakage between datanodes. > > > The connection leakage like this > dn1 > dn1:57042 dn2:50010 ESTABLISHED > dn2 > dn2:50010 dn1:57042 ESTABLISHED -- This message was sent by Atlassian Jira (v8.20.10#820010) --------------------------------------------------------------------- To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org