[
https://issues.apache.org/jira/browse/HDFS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895514#action_12895514
]
jinglong.liujl commented on HDFS-1325:
--------------------------------------
>Yes. Following from this direction, we probably should limit the number of
>open files, like the file descriptor limit in Unix.
Of course, close file is necessary, but if a user don't close file or there's
some bugs in his application, As a distribute system, we should keep service,
right ?
>In the patch, a new TimeoutChecker thread is started for each DFSInputStream.
>It is very expensive. All clients, idle or not, have to pay for it.
Yes, New a thread is not very cheap for single machine, but I think it should
see what's the bottleneck in our system, If number of connections will bring
machine down, a watch-dog thread(TimeoutChecker ) will save it. Absolutely,
locate it into LeaseChecker or other thread is OK, but it's not very clear in
code structure.
> DFSClient(DFSInputStream) release the persistent connection with datanode
> when no data have been read for a long time
> ---------------------------------------------------------------------------------------------------------------------
>
> Key: HDFS-1325
> URL: https://issues.apache.org/jira/browse/HDFS-1325
> Project: Hadoop HDFS
> Issue Type: Improvement
> Components: hdfs client
> Reporter: jinglong.liujl
> Fix For: 0.20.3
>
> Attachments: dfsclient.patch, toomanyconnction.patch
>
>
> When you use Hbase over hadoop. We found during scanning over a large table (
> which has many regions and each region has many store files), there're too
> many connections has been kept between regionserver (act as DFSClient) and
> datanode. Even if the store file has been complete to scanning, the
> connections can not be closed.
> In our cluster, too many extra connections cause too many system resource has
> been wasted, which cause system cpu on region server reach to a high level,
> then bring this region server down.
> After investigating, we found the number of active connection is very small,
> and the most connection is idle. We add a timeout checker thread into
> DFSClient, to close this connection.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.