-----Original Message-----
From: dhruba Borthakur [mailto:[EMAIL PROTECTED]
Sent: Friday, March 14, 2008 10:53 AM
To: [email protected]; [EMAIL PROTECTED]
Subject: RE: Multiplexing sockets in DFSClient/datanodes?
Hi Jim,
The protocol between the client and the Datanodes will become
relatively more complex if we decide to multiplex
simultaneous transfers of multiple blocks on the same socket
connection. Do you think that the benefit of saving on system
resources is really appreciable?
Thanks,
Dhruba
-----Original Message-----
From: Sanjay Radia [mailto:[EMAIL PROTECTED]
Sent: Wednesday, March 12, 2008 11:36 AM
To: [EMAIL PROTECTED]
Subject: Re: Multiplexing sockets in DFSClient/datanodes?
Doug Cutting wrote:
Jim Kellerman wrote:
Yes, multiplexing a socket is more complicated than having
one socket
per file, but saving system resources seems like a way to scale.
Questions? Comments? Opinions? Flames?
Note that Hadoop RPC already multiplexes, sharing a single
socket per
pair of JVMs. It would be possible to multiplex datanode,
and should
not in theory significantly impact performance, but, as you
indicate,
it would be a significant change. One approach might be to
implement
HDFS data access using RPC rather than directly using stream i/o.
RPC also tears down idle connections, which HDFS does not.
I wonder
how much doing that alone might help your case? That would
probably
be much simpler to implement. Both client and server must already
handle connection failures, so it shouldn't be too great of
a change
to have one or both sides actively close things down if
they're idle
for more than a few seconds. This is related to adding
write timeouts
to the datanode (HADOOP-2346).
Doug,
Dhruba and I had discussed using RPC in the past. While
RPC is a cleaner interface and our rpc implementation has
features such sharing connection, closing idle connections
etc, streaming IO lets to pipe large amounts of data without
the request/response exchange.
The worry was that IO performance would degrade.
BTW, NFS uses rpc (NFS does not have the write pipeline for replicas)
sanjay
Doug
No virus found in this incoming message.
Checked by AVG.
Version: 7.5.519 / Virus Database: 269.21.7/1329 - Release
Date: 3/14/2008 12:33 PM