[jira] Commented: (HADOOP-3113) DFSOututStream.flush() should flush data to real block file on DataNode.

Hadoop QA (JIRA) Wed, 04 Jun 2008 02:22:12 -0700

    [ 
https://issues.apache.org/jira/browse/HADOOP-3113?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12602222#action_12602222
 ]


Hadoop QA commented on HADOOP-3113:
-----------------------------------

+1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12383353/tmpFile.patch
  against trunk revision 662976.

    +1 @author.  The patch does not contain any @author tags.

    +1 tests included.  The patch appears to include 6 new or modified tests.

    +1 javadoc.  The javadoc tool did not generate any warning messages.

    +1 javac.  The applied patch does not increase the total number of javac 
compiler warnings.

    +1 findbugs.  The patch does not introduce any new Findbugs warnings.

    +1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

    +1 core tests.  The patch passed core unit tests.

    +1 contrib tests.  The patch passed contrib unit tests.

Test results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2566/testReport/
Findbugs warnings: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2566/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
Checkstyle results: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2566/artifact/trunk/build/test/checkstyle-errors.html
Console output: 
http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2566/console

This message is automatically generated.

> DFSOututStream.flush() should flush data to real block file on DataNode.
> ------------------------------------------------------------------------
>
>                 Key: HADOOP-3113
>                 URL: https://issues.apache.org/jira/browse/HADOOP-3113
>             Project: Hadoop Core
>          Issue Type: Bug
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: noTmpFile.patch, noTmpFile.patch, tmpFile.patch, 
> tmpFile.patch, tmpFile.patch, tmpFile.patch
>
>
> DFSOutputStream has a method called flush() that persists block locations on 
> the namenode and sends all outstanding data to all datanodes in the pipeline. 
> However, this data goes to the tmp file on the datanode(s). When the block is 
> closed, the tmp files is renamed to be the real block file. If the 
> datanode(s) dies before the block is compete, then entire block is lost. This 
> behaviour wil be fixed in HADOOP-1700.
> However, in the short term, a configuration paramater can be used to allow 
> datanodes to write to the real block file directly, thereby avoiding writing 
> to the tmp file. This means that data that is flushed successfully by a 
> client does not get lost even if the datanode(s) or client dies.
> The Namenode already has code to pick the largest replica (if multiple 
> datanodes have different sizes of this block). Also, the namenode has code to 
> not trigger replication request if the file is still being written to.
> The only caveat that I can think of is that the block report periodicity 
> should be much much smaller that the lease timeout period. A block report 
> adds the being-written-to blocks to the blocksMap thereby avoiding any 
> cleanup that a lease expiry processing might have otherwise done.
> Not all requirements specified by HADOOP-1700 are supported by this approach, 
> but it could still be helpful (in the short term) for a wide range of 
> applications.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-3113) DFSOututStream.flush() should flush data to real block file on DataNode.

Reply via email to