Parallel data/socket writing for DFSOutputStream
------------------------------------------------
Key: HADOOP-445
URL: http://issues.apache.org/jira/browse/HADOOP-445
Project: Hadoop
Issue Type: Improvement
Affects Versions: 0.5.0
Reporter: Benjamin Reed
Attachments: fastClientWrite.patch
Currently, as DFS clients output blocks they write the entire block to disk
before starting to transmit to the datanode. By writing to disk the client is
able to retry a block write if the datanode files in the middle of a block
transfer. Writing to disk and then to the datanode adds latency. Hopefully, the
common case is that block transfers to datanodes are successful. This patch
writes to the datanode and the disk in parallel. If the write to the datanode
fails, it falls back to current behavior.
In my tests of transmits of 237M and 946M datasets using -copyFromLocal I'm
seeing a 20-25% improvement in throughput.
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira