[ 
https://issues.apache.org/jira/browse/HADOOP-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dhruba borthakur updated HADOOP-2657:
-------------------------------------

    Attachment: flush5.patch

Incorporated Raghu's review comments.

One question is "what is the semantics of flush?". My opinion is that the 
client should confirm that the data has reached the OS buffers on all datanodes 
in the pipeline before the flush call returns. This will enable applications 
like HBase to use this flush API on the HBase transaction log (which is a HDFS 
file) and rest easy that it is persisted.

If the DFSOutputStream.flush() does not guarantee that the data has reached the 
OS buffers on datanode(s) then this API migth not be useful for HBase.

> Enhancements to DFSClient to support flushing data at any point in time
> -----------------------------------------------------------------------
>
>                 Key: HADOOP-2657
>                 URL: https://issues.apache.org/jira/browse/HADOOP-2657
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: dhruba borthakur
>            Assignee: dhruba borthakur
>         Attachments: flush.patch, flush2.patch, flush3.patch, flush4.patch, 
> flush5.patch
>
>
> The HDFS Append Design (HADOOP-1700) requires that there be a public API to 
> flush data written to a HDFS file that can be invoked by an application. This 
> API (popularly referred to a fflush(OutputStream)) will ensure that data 
> written to the DFSOutputStream is flushed to datanodes and any required 
> metadata is persisted on Namenode.
> This API has to handle the case when the client decides to flush after 
> writing data that is not a exact multiple of io.bytes.per.checksum.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to