Provide a configurable way for DFSOututStream.flush() to flush data to real 
block file on DataNode.
---------------------------------------------------------------------------------------------------

                 Key: HADOOP-3113
                 URL: https://issues.apache.org/jira/browse/HADOOP-3113
             Project: Hadoop Core
          Issue Type: Bug
          Components: dfs
            Reporter: dhruba borthakur


DFSOutputStream has a method called flush() that persists block locations on 
the namenode and sends all outstanding data to all datanodes in the pipeline. 
However, this data goes to the tmp file on the datanode(s). When the block is 
closed, the tmp files is renamed to be the real block file. If the datanode(s) 
dies before the block is compete, then entire block is lost. This behaviour wil 
be fixed in HADOOP-1700.

However, in the short term, a configuration paramater can be used to allow 
datanodes to write to the real block file directly, thereby avoiding writing to 
the tmp file. This means that data that is flushed successfully by a client 
does not get lost even if the datanode(s) or client dies.

The Namenode already has code to pick the largest replica (if multiple 
datanodes have different sizes of this block). Also, the namenode has code to 
not trigger replication request if the file is still being written to.

The only caveat that I can think of is that the block report periodicity should 
be much much smaller that the lease timeout period. A block report adds the 
being-written-to blocks to the blocksMap thereby avoiding any cleanup that a 
lease expiry processing might have otherwise done.

Not all requirements specified by HADOOP-1700 are supported by this approach, 
but it could still be helpful (in the short term) for a wide range of 
applications.





-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to