[
https://issues.apache.org/jira/browse/HDFS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894870#action_12894870
]
jinglong.liujl commented on HDFS-1325:
--------------------------------------
Thank you for your recommend.HDFS-941 can fix our issue.
In fact, this timeout is a final way to release connection. For example,
if user(dfsclient) has not read data for 1 hour (currently, supposed timeout is
a hour ), we can release the connection between DFSClient and datanode. For
idle user, platform can release the resource which he request before.
In our case, DFSClient request some data, and datanode will send a whole
block(block in HDFS) to client, the extra data is left in receive queue and
send queue. In our cluster, thousands of connections has been kept in region
server, and socket memory has been over used. so socket oom cause machine down.
(from dmesg).
This issue can be deal as duplicated bug. Thanks again.
> DFSClient(DFSInputStream) release the persistent connection with datanode
> when no data have been read for a long time
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-1325
> URL: https://issues.apache.org/jira/browse/HDFS-1325
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs client
> Reporter: jinglong.liujl
> Fix For: 0.20.3
>
> Attachments: dfsclient.patch, toomanyconnction.patch
>
>
> When you use Hbase over hadoop. We found during scanning over a large table (
> which has many regions and each region has many store files), there're too
> many connections has been kept between regionserver (act as DFSClient) and
> datanode. Even if the store file has been complete to scanning, the
> connections can not be closed.
> In our cluster, too many extra connections cause too many system resource has
> been wasted, which cause system cpu on region server reach to a high level,
> then bring this region server down.
> After investigating, we found the number of active connection is very small,
> and the most connection is idle. We add a timeout checker thread into
> DFSClient, to close this connection.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.