[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HADOOP-1707: ------------------------------------- Attachment: clientDiskBuffer15.patch Found a race condition that was causing the client to close the connection before the datanodes had a chance to process the end-of-packet. This caused the datanode to treat it as an error condition, thus causing the client to do error recovery and re-send the outstanding packets to the remaining good datanodes. This was causing performance regression. > Remove the DFS Client disk-based cache > -------------------------------------- > > Key: HADOOP-1707 > URL: https://issues.apache.org/jira/browse/HADOOP-1707 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Fix For: 0.16.0 > > Attachments: clientDiskBuffer.patch, clientDiskBuffer10.patch, > clientDiskBuffer11.patch, clientDiskBuffer12.patch, clientDiskBuffer14.patch, > clientDiskBuffer15.patch, clientDiskBuffer2.patch, clientDiskBuffer6.patch, > clientDiskBuffer7.patch, clientDiskBuffer8.patch, clientDiskBuffer9.patch, > DataTransferProtocol.doc, DataTransferProtocol.html > > > The DFS client currently uses a staging file on local disk to cache all > user-writes to a file. When the staging file accumulates 1 block worth of > data, its contents are flushed to a HDFS datanode. These operations occur > sequentially. > A simple optimization of allowing the user to write to another staging file > while simultaneously uploading the contents of the first staging file to HDFS > will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.