[
https://issues.apache.org/jira/browse/HADOOP-3024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Jim Kellerman updated HADOOP-3024:
----------------------------------
Summary: HDFS needs to support a very large number of open files. (was:
DFSClient should implement some kind of socket pooling)
> HDFS needs to support a very large number of open files.
> --------------------------------------------------------
>
> Key: HADOOP-3024
> URL: https://issues.apache.org/jira/browse/HADOOP-3024
> Project: Hadoop Core
> Issue Type: Improvement
> Components: dfs
> Affects Versions: 0.17.0
> Reporter: Jim Kellerman
>
> Currently, DFSClient maintains one socket per open file. For most map/reduce
> operations, this is not a problem because there just aren't many open files.
> However, HBase has a very different usage model in which a single region
> region server could have thousands (10**3 but less than 10**4) open files.
> This can cause both datanodes and region servers to run out of file handles.
> What I would like to see is one connection for each dfsClient, datanode pair.
> This would reduce the number of connections to hundreds or tens of sockets.
> The intent is not to process requests totally asychronously (overlapping
> block reads and forcing the client to reassemble a whole message out of a
> bunch of fragments), but rather to queue requests from the client to the
> datanode and process them serially, differing from the current implementation
> in that rather than use an exclusive socket for each file, only one socket is
> in use between the client and a particular datanode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.