Re: Multiplexing sockets in DFSClient/datanodes?

Sanjay Radia Wed, 12 Mar 2008 11:37:07 -0700

Doug Cutting wrote:

Jim Kellerman wrote:
Yes, multiplexing a socket is more complicated than having one socket
per file, but saving system resources seems like a way to scale.
Questions? Comments? Opinions? Flames?
Note that Hadoop RPC already multiplexes, sharing a single socket perpair of JVMs. It would be possible to multiplex datanode, and shouldnot in theory significantly impact performance, but, as you indicate,it would be a significant change. One approach might be to implementHDFS data access using RPC rather than directly using stream i/o.
RPC also tears down idle connections, which HDFS does not. I wonderhow much doing that alone might help your case? That would probablybe much simpler to implement. Both client and server must alreadyhandle connection failures, so it shouldn't be too great of a changeto have one or both sides actively close things down if they're idlefor more than a few seconds. This is related to adding write timeoutsto the datanode (HADOOP-2346).


Doug,

Dhruba and I had discussed using RPC in the past. While RPC is acleaner interface and our rpc implementation hasfeatures such sharing connection, closing idle connections etc,streaming IO lets to pipe large amounts

of data without the request/response exchange.
The worry was that IO performance would degrade.
BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)

sanjay


Doug

Re: Multiplexing sockets in DFSClient/datanodes?

Reply via email to