[ 
https://issues.apache.org/jira/browse/HADOOP-89?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#action_12523170
 ] 

dhruba borthakur commented on HADOOP-89:
----------------------------------------

1. Contents of new blocks that are appended to a file is visible to clients as 
soon as the datanode reports the block to the namenode. This means that data is 
visible to clients even when the block metadata is not yet persisted on disk. 
This approach lets us avoid a fs-transaction into the edit log for every new 
block allocation.

2. The block allocation for a file is persisted in the edit log when the file 
is closed.

3. A new API FSDataOutputStream.sync() allows an application to make data 
persistent on disk even before the file is closed. The invocation of this API 
causes a transaction to be logged into the edits log to record the blocks that 
are currently allocated to the file. An application that is recording data to a 
log file will periodically invoke this API to ensure that the contents of the 
log file persist even if the application dies before closing the file.

4. The FsShell utility has a new command that is invoked as "bin/hadoop dfs 
-tail [-f] <filename>". When the "-f" option is used, the FsShell utility will 
periodically poll for changes to the filesize. When a filesize change is 
detected, it will re-open the file and will display the new contents that were 
added to the file.


> files are not visible until they are closed
> -------------------------------------------
>
>                 Key: HADOOP-89
>                 URL: https://issues.apache.org/jira/browse/HADOOP-89
>             Project: Hadoop
>          Issue Type: Bug
>          Components: dfs
>    Affects Versions: 0.1.0
>            Reporter: Yoram Arnon
>            Assignee: dhruba borthakur
>            Priority: Critical
>         Attachments: atomicCreation.patch
>
>
> the current behaviour, whereby a file is not visible until it is closed has 
> several flaws,including:
> 1. no practical way to know if a file/job is progressing
> 2. no way to implement files that never close, such as log files
> 3. failure to close a file results in loss of the file
> The part of the file that's written should be visible.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

Reply via email to