Provide a configurable way for DFSOututStream.flush() to flush data to real
block file on DataNode.
---------------------------------------------------------------------------------------------------
Key: HADOOP-3113
URL: https://issues.apache.org/jira/browse/HADOOP-3113
Project: Hadoop Core
Issue Type: Bug
Components: dfs
Reporter: dhruba borthakur
DFSOutputStream has a method called flush() that persists block locations on
the namenode and sends all outstanding data to all datanodes in the pipeline.
However, this data goes to the tmp file on the datanode(s). When the block is
closed, the tmp files is renamed to be the real block file. If the datanode(s)
dies before the block is compete, then entire block is lost. This behaviour wil
be fixed in HADOOP-1700.
However, in the short term, a configuration paramater can be used to allow
datanodes to write to the real block file directly, thereby avoiding writing to
the tmp file. This means that data that is flushed successfully by a client
does not get lost even if the datanode(s) or client dies.
The Namenode already has code to pick the largest replica (if multiple
datanodes have different sizes of this block). Also, the namenode has code to
not trigger replication request if the file is still being written to.
The only caveat that I can think of is that the block report periodicity should
be much much smaller that the lease timeout period. A block report adds the
being-written-to blocks to the blocksMap thereby avoiding any cleanup that a
lease expiry processing might have otherwise done.
Not all requirements specified by HADOOP-1700 are supported by this approach,
but it could still be helpful (in the short term) for a wide range of
applications.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.