[ 
https://issues.apache.org/jira/browse/HDFS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895514#action_12895514
 ] 

jinglong.liujl commented on HDFS-1325:
--------------------------------------

>Yes. Following from this direction, we probably should limit the number of 
>open files, like the file descriptor limit in Unix.

Of course, close file is necessary,  but if a user don't close file or there's 
some bugs in his application,  As a distribute system, we should keep service, 
right ? 


>In the patch, a new TimeoutChecker thread is started for each DFSInputStream. 
>It is very expensive. All clients, idle or not, have to pay for it.

Yes, New a thread is not very cheap for single machine, but I think it should 
see what's the bottleneck in our system, If number of connections will bring 
machine down, a watch-dog thread(TimeoutChecker ) will save it. Absolutely, 
locate it into LeaseChecker or other thread is OK, but   it's not very clear in 
code structure.


> DFSClient(DFSInputStream) release the persistent connection with datanode 
> when no data have been read for a long time
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1325
>                 URL: https://issues.apache.org/jira/browse/HDFS-1325
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>            Reporter: jinglong.liujl
>             Fix For: 0.20.3
>
>         Attachments: dfsclient.patch, toomanyconnction.patch
>
>
> When you use Hbase over hadoop. We found during scanning over a large table ( 
> which has many regions and each region has many store files), there're too 
> many connections has been kept between regionserver (act as DFSClient) and 
> datanode.  Even if the store file has been complete to scanning, the 
> connections can not be closed.
> In our cluster, too many extra connections cause too many system resource has 
> been wasted, which cause system cpu on region server reach to a high level, 
> then bring this region server down.
> After investigating, we found the number of active connection is very small, 
> and the most connection is idle. We add a timeout checker thread into 
> DFSClient, to close this connection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to