On Aug 19, 2008, at 10:57 PM, Dhruba Borthakur wrote:
For Data transfer, large volumes of data needs to be transferred, so HDFS uses a streaming protocol between client and datanodes.
Another way to think about it is that Hadoop's rpc is not designed for large data transfers, so we had to use a custom protocol. A similar decision was made in map/reduce's shuffle, which uses jetty and http. It is entirely possible that the rpc will be extended to support large transfers and moving to rpc would help simplify a lot of the datanode and hdfs client code.
-- Owen
