[ 
https://issues.apache.org/jira/browse/HDFS-1325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12894957#action_12894957
 ] 

jinglong.liujl commented on HDFS-1325:
--------------------------------------

Yes, 
In read() function, include blockSeekTo() and readBuffer() two parts. 

1. If when user issue a read invoke, we'll check whether the socket is 
close(only caused by timeout), is yes or pos > blockEnd (as the logic before), 
we'll call blockSeekTo() to init a socket connection to DataXciever, and user 
can read data as usual.
2. And in case, timeout raise on the time window between blockSeekTo()  and 
readBuffer() , In readBuffer() will meet IOException, then in readBuffer 
seekToBlockSource() can be called to issue a connection.

And mechanism before can make reconnect transparent to user. 

> DFSClient(DFSInputStream) release the persistent connection with datanode 
> when no data have been read for a long time
> ---------------------------------------------------------------------------------------------------------------------
>
>                 Key: HDFS-1325
>                 URL: https://issues.apache.org/jira/browse/HDFS-1325
>             Project: Hadoop HDFS
>          Issue Type: Improvement
>          Components: hdfs client
>            Reporter: jinglong.liujl
>             Fix For: 0.20.3
>
>         Attachments: dfsclient.patch, toomanyconnction.patch
>
>
> When you use Hbase over hadoop. We found during scanning over a large table ( 
> which has many regions and each region has many store files), there're too 
> many connections has been kept between regionserver (act as DFSClient) and 
> datanode.  Even if the store file has been complete to scanning, the 
> connections can not be closed.
> In our cluster, too many extra connections cause too many system resource has 
> been wasted, which cause system cpu on region server reach to a high level, 
> then bring this region server down.
> After investigating, we found the number of active connection is very small, 
> and the most connection is idle. We add a timeout checker thread into 
> DFSClient, to close this connection.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to