DFSClient should implement some kind of socket pooling
------------------------------------------------------
Key: HADOOP-3024
URL: https://issues.apache.org/jira/browse/HADOOP-3024
Project: Hadoop Core
Issue Type: Improvement
Components: dfs
Affects Versions: 0.17.0
Reporter: Jim Kellerman
Currently, DFSClient maintains one socket per open file. For most map/reduce
operations, this is not a problem because there just aren't many open files.
However, HBase has a very different usage model in which a single region region
server could have thousands (10**3 but less than 10**4) open files. This can
cause both datanodes and region servers to run out of file handles.
What I would like to see is one connection for each dfsClient, datanode pair.
This would reduce the number of connections to hundreds or tens of sockets.
The intent is not to process requests totally asychronously (overlapping block
reads and forcing the client to reassemble a whole message out of a bunch of
fragments), but rather to queue requests from the client to the datanode and
process them serially, differing from the current implementation in that rather
than use an exclusive socket for each file, only one socket is in use between
the client and a particular datanode.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.