[ https://issues.apache.org/jira/browse/HDFS-374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Allen Wittenauer resolved HDFS-374. ----------------------------------- Resolution: Fixed I'm going to resolve this as stale. There is a good chance this issue might still exist but isn't nearly the concern it once was. If so, please open a new jira. > HDFS needs to support a very large number of open files. > -------------------------------------------------------- > > Key: HDFS-374 > URL: https://issues.apache.org/jira/browse/HDFS-374 > Project: Hadoop HDFS > Issue Type: Improvement > Reporter: Jim Kellerman > > Currently, DFSClient maintains one socket per open file. For most map/reduce > operations, this is not a problem because there just aren't many open files. > However, HBase has a very different usage model in which a single region > region server could have thousands (10**3 but less than 10**4) open files. > This can cause both datanodes and region servers to run out of file handles. > What I would like to see is one connection for each dfsClient, datanode pair. > This would reduce the number of connections to hundreds or tens of sockets. > The intent is not to process requests totally asychronously (overlapping > block reads and forcing the client to reassemble a whole message out of a > bunch of fragments), but rather to queue requests from the client to the > datanode and process them serially, differing from the current implementation > in that rather than use an exclusive socket for each file, only one socket is > in use between the client and a particular datanode. -- This message was sent by Atlassian JIRA (v6.2#6252)