Parallel data/socket writing for DFSOutputStream
------------------------------------------------

                 Key: HADOOP-445
                 URL: http://issues.apache.org/jira/browse/HADOOP-445
             Project: Hadoop
          Issue Type: Improvement
    Affects Versions: 0.5.0
            Reporter: Benjamin Reed
         Attachments: fastClientWrite.patch

Currently, as DFS clients output blocks they write the entire block to disk 
before starting to transmit to the datanode. By writing to disk the client is 
able to retry a block write if the datanode files in the middle of a block 
transfer. Writing to disk and then to the datanode adds latency. Hopefully, the 
common case is that block transfers to datanodes are successful. This patch 
writes to the datanode and the disk in parallel. If the write to the datanode 
fails, it falls back to current behavior.

In my tests of transmits of 237M and 946M datasets using -copyFromLocal I'm 
seeing a 20-25% improvement in throughput.

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

Reply via email to