Doug Cutting wrote:
Jim Kellerman wrote:
Yes, multiplexing a socket is more complicated than having one socket
per file, but saving system resources seems like a way to scale.
Questions? Comments? Opinions? Flames?
Datanode needs async io for disk reads and writes as well. How well does
Java NIO support async disk io?
As Doug mentioned does will HADOOP-2346 do for now? In fact write
timeout can be made configurable.
Raghu.
Note that Hadoop RPC already multiplexes, sharing a single socket per
pair of JVMs. It would be possible to multiplex datanode, and should
not in theory significantly impact performance, but, as you indicate, it
would be a significant change. One approach might be to implement HDFS
data access using RPC rather than directly using stream i/o.
RPC also tears down idle connections, which HDFS does not. I wonder how
much doing that alone might help your case? That would probably be much
simpler to implement. Both client and server must already handle
connection failures, so it shouldn't be too great of a change to have
one or both sides actively close things down if they're idle for more
than a few seconds. This is related to adding write timeouts to the
datanode (HADOOP-2346).
Doug