[
https://issues.apache.org/jira/browse/HADOOP-2657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12572703#action_12572703
]
dhruba borthakur commented on HADOOP-2657:
------------------------------------------
This patch does nto require changes in the datanode because the datanode
already has code that deals with packet replays. Each packet has a "offset in
the block". This patch ensures that packets that are flushed have the correct
value set in "offset in the block".
A user can "flush" before "close", no problem. In this case, it is likely that
the flush will result in a RPC to the namenode (to persist block locations).
The close will have another RPC to the namenode that closes the file. Thus,
there will be two RPCs to the namenode. If the application does multiple
flushes followed by a close (without writing any new data) it will result in at
most two RPCs to the namenode.
> Enhancements to DFSClient to support flushing data at any point in time
> -----------------------------------------------------------------------
>
> Key: HADOOP-2657
> URL: https://issues.apache.org/jira/browse/HADOOP-2657
> Project: Hadoop Core
> Issue Type: New Feature
> Components: dfs
> Reporter: dhruba borthakur
> Assignee: dhruba borthakur
> Attachments: flush.patch, flush2.patch
>
>
> The HDFS Append Design (HADOOP-1700) requires that there be a public API to
> flush data written to a HDFS file that can be invoked by an application. This
> API (popularly referred to a fflush(OutputStream)) will ensure that data
> written to the DFSOutputStream is flushed to datanodes and any required
> metadata is persisted on Namenode.
> This API has to handle the case when the client decides to flush after
> writing data that is not a exact multiple of io.bytes.per.checksum.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.