[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
dhruba borthakur updated HADOOP-1707: ------------------------------------- Attachment: clientDiskBuffer6.patch This patch removes the client side disk buffer. 1. FSConstants.java : Bumped up DATA_TRANSFER_VERSION. 2. Daemon.java: Added a ThreadGroup to the Daemon class. All worker threads that process data transfers belong to this group. The shutdown of a datnode waits for the entire threadgroup to exit. Prior to this change, a datanode shutdown did not wait for the data transfer threads to exit. 3. FSNamesystem.java: Allows a zero size file to have no blocks associated with it. 4. DataChecksum.java: A utility method to return the size of a checksum header. 5. FSDataset.java: The ongoingCreates data structure remembers the thread that is currently writing to a block. The writeToBlock() method (when the recovery flag is set) terminates any existing threads that might have been writing to a block before allowing a new thread to write to the same block. 6. FSDataOutputStream.java: The unit test needed to extract the pipeline associated with a block. This is facilitated by exposing a new public API called getWrappedStream() that returns the underlying DFSOutputStream object. 7. MiniDFSCluster.java: Allows stopping a particular datanode. 8. DFSClient.java/DataNode.java: User data is transferred in the form of packets. Each Packet requires an ack from all datanodes. The DFSClient drives the entire recovery strategy. A keepalive is sent every READ_TIMEOUT/2 period on the response socket channel. Each packet is 64K in size and the client has a sliding window of 5MB per stream. 9. TestDatanodeDeath.java: A unit test to trigger datanode deaths and DFSClient recovery. > Remove the DFS Client disk-based cache > -------------------------------------- > > Key: HADOOP-1707 > URL: https://issues.apache.org/jira/browse/HADOOP-1707 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Fix For: 0.16.0 > > Attachments: clientDiskBuffer.patch, clientDiskBuffer2.patch, > clientDiskBuffer6.patch > > > The DFS client currently uses a staging file on local disk to cache all > user-writes to a file. When the staging file accumulates 1 block worth of > data, its contents are flushed to a HDFS datanode. These operations occur > sequentially. > A simple optimization of allowing the user to write to another staging file > while simultaneously uploading the contents of the first staging file to HDFS > will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.