[ 
https://issues.apache.org/jira/browse/HADOOP-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-3113:
-------------------------------------

    Summary: DFSOututStream.flush() should flush data to real block file on 
DataNode.  (was: Provide a configurable way for DFSOututStream.flush() to flush 
data to real block file on DataNode.)

> DFSOututStream.flush() should flush data to real block file on DataNode.
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3113
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: noTmpFile.patch
>
>
> DFSOutputStream has a method called flush() that persists block locations on 
> the namenode and sends all outstanding data to all datanodes in the pipeline. 
> However, this data goes to the tmp file on the datanode(s). When the block is 
> closed, the tmp files is renamed to be the real block file. If the 
> datanode(s) dies before the block is compete, then entire block is lost. This 
> behaviour wil be fixed in HADOOP-1700.
> However, in the short term, a configuration paramater can be used to allow 
> datanodes to write to the real block file directly, thereby avoiding writing 
> to the tmp file. This means that data that is flushed successfully by a 
> client does not get lost even if the datanode(s) or client dies.
> The Namenode already has code to pick the largest replica (if multiple 
> datanodes have different sizes of this block). Also, the namenode has code to 
> not trigger replication request if the file is still being written to.
> The only caveat that I can think of is that the block report periodicity 
> should be much much smaller that the lease timeout period. A block report 
> adds the being-written-to blocks to the blocksMap thereby avoiding any 
> cleanup that a lease expiry processing might have otherwise done.
> Not all requirements specified by HADOOP-1700 are supported by this approach, 
> but it could still be helpful (in the short term) for a wide range of 
> applications.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to