[jira] Updated: (HADOOP-1707) Remove the DFS Client disk-based cache

dhruba borthakur (JIRA) Tue, 13 Nov 2007 01:09:15 -0800

     [ 
https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


dhruba borthakur updated HADOOP-1707:
-------------------------------------

    Attachment: clientDiskBuffer6.patch

This patch removes the client side disk buffer. 

1. FSConstants.java : Bumped up DATA_TRANSFER_VERSION.
2. Daemon.java: Added a ThreadGroup to the Daemon class. All worker threads 
that process data transfers belong to this group. The shutdown of a datnode 
waits for the entire threadgroup to exit. Prior to this change, a datanode 
shutdown did not wait for the data transfer threads to exit.
3. FSNamesystem.java: Allows a zero size file to have no blocks associated with 
it.
4. DataChecksum.java: A utility method to return the size of a checksum header.
5. FSDataset.java: The ongoingCreates data structure remembers the thread that 
is currently writing to a block. The writeToBlock() method (when the recovery 
flag is set) terminates any existing threads that might have been writing to a 
block before allowing a new thread to write to the same block.
6. FSDataOutputStream.java: The unit test needed to extract the pipeline 
associated with a block. This is facilitated by exposing a new public API 
called getWrappedStream() that returns the underlying DFSOutputStream object.
7. MiniDFSCluster.java: Allows stopping a particular datanode.
8. DFSClient.java/DataNode.java: User data is transferred in the form of 
packets. Each Packet requires an ack from all datanodes. The DFSClient drives 
the entire recovery strategy. A keepalive is sent every READ_TIMEOUT/2 period 
on the response socket channel. Each packet is 64K in size and the client has a 
sliding window of 5MB per stream.
9. TestDatanodeDeath.java: A unit test to trigger datanode deaths and DFSClient 
recovery.




> Remove the DFS Client disk-based cache
> --------------------------------------
>
>                 Key: HADOOP-1707
>                 URL: https://issues.apache.org/jira/browse/HADOOP-1707
>             Project: Hadoop
>          Issue Type: Improvement
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>             Fix For: 0.16.0
>
>         Attachments: clientDiskBuffer.patch, clientDiskBuffer2.patch, 
> clientDiskBuffer6.patch
>
>
> The DFS client currently uses a staging file on local disk to cache all 
> user-writes to a file. When the staging file accumulates 1 block worth of 
> data, its contents are flushed to a HDFS datanode. These operations occur 
> sequentially.
> A simple optimization of allowing the user to write to another staging file 
> while simultaneously uploading the contents of the first staging file to HDFS 
> will improve file-upload performance.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Updated: (HADOOP-1707) Remove the DFS Client disk-based cache

Reply via email to