[ https://issues.apache.org/jira/browse/HADOOP-1707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12538657 ]
dhruba borthakur commented on HADOOP-1707: ------------------------------------------ I agree that that timeout issue does not have a very elegant solution. Here is a new proposal. The Client -------------- 1. The Client uses a small pool of memory buffers per dfs-output stream. Say, 10 buffers of size 64K each. 2. A write to the output stream actually copies the user data into one of the buffers, if available. Otherwise the user-write blocks. 3. A separate thread (one per output stream), sends buffers that are full. Each buffer has metadata that contains a sequence number (locally generated on the client) , the length of the buffer and its offset in this block. 4. Another thread(one per output stream) process incoming responses. The incoming response has the sequence number of the buffer that the datanode had processed. The client removes that buffer from its queue. The Primary Datanode ------------------------------ The primary datanode has two threads per stream. The first thread processes incoming packets from the client, writes them to the downstream datanode and writes them to local disk. The second thread processes responses from downstream datanodes and forwards them back to the client. This means that the client gets back an ack only when the packet is persisted on all datanodes. In the future this can be changed so that the client gets an ack when the data is persisted in dfs.replication.min number of datanodes. In case the primary datanode encounters an exception while writing to the downstream datanode, it declares the block as bad. It removes the immediate downstream datanode from the pipeline. It makes an RPC to the namenode to abandon the current blockId and*replace* the block id with a new one. It then establishes a new pipeline using the new blockid using the remaining datanodes. It then copies all the data from the local temporary block file to the downstream datanodes using the new blockId. The Secondary Datanodes ------------------------------------ The Secondary datanode has two threads per stream. The first thread processes incoming packets from the upstream datanode, writes them to the downstream datanode and writes them to local disk. The second thread processes responses from downstream datanodes and forwards them back to the upstream datanode. Each secondary datanode sends its response as well forwards the response of all downstream datanodes. > Remove the DFS Client disk-based cache > -------------------------------------- > > Key: HADOOP-1707 > URL: https://issues.apache.org/jira/browse/HADOOP-1707 > Project: Hadoop > Issue Type: Improvement > Components: dfs > Reporter: dhruba borthakur > Assignee: dhruba borthakur > Fix For: 0.16.0 > > > The DFS client currently uses a staging file on local disk to cache all > user-writes to a file. When the staging file accumulates 1 block worth of > data, its contents are flushed to a HDFS datanode. These operations occur > sequentially. > A simple optimization of allowing the user to write to another staging file > while simultaneously uploading the contents of the first staging file to HDFS > will improve file-upload performance. -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.