[jira] Commented: (HADOOP-4379) In HDFS, sync() not yet guarantees data available to the new readers

dhruba borthakur (JIRA) Tue, 27 Jan 2009 23:22:23 -0800

    [ 
https://issues.apache.org/jira/browse/HADOOP-4379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12667952#action_12667952
 ]


dhruba borthakur commented on HADOOP-4379:
------------------------------------------

@Doug: the waiting for lease recovery to reclaim the length of the jar file 
will work only if the original writer has dies and now a new reader wants to 
read all the data that was written by the writer before it died. This was the 
use-case for Jim Kellerman.

In your case, you have a FileSystem object that was used to write to the file. 
Then your application tried to reopen the same file by invoking fs.apend() on 
the same FileSystem object. The namenode logs clearly shows that the same 
dfsclient (i.e. filesystem object) was used to try appending to the file. In 
this case, the namenode rejects the append call because it knows that the 
original writer is alive and is still writing to it. If your aim is to have a 
concurrent reader along with a concurrent writer, then the best that this patch 
can do is to allow the concurent reader to see the new file length only when 
the block is full. On the other hand, if you can make your application ot 
depend on the file length, then you can see all the data in the file. Another 
alternative would be to implement a new call FileSystem.length() than can 
retrieve the latest length from the datanode, but that can be done as part of a 
separate JIRA. Please let us know how difficult is it for you to change your 
app to not depend on the file length and just read till end-of-file?

> In HDFS, sync() not yet guarantees data available to the new readers
> --------------------------------------------------------------------
>
>                 Key: HADOOP-4379
>                 URL: https://issues.apache.org/jira/browse/HADOOP-4379
>             Project: Hadoop Core
>          Issue Type: New Feature
>          Components: dfs
>            Reporter: Tsz Wo (Nicholas), SZE
>            Assignee: dhruba borthakur
>             Fix For: 0.19.1
>
>         Attachments: 4379_20081010TC3.java, fsyncConcurrentReaders.txt, 
> fsyncConcurrentReaders3.patch, fsyncConcurrentReaders4.patch, 
> hypertable-namenode.log.gz, Reader.java, Reader.java, Writer.java, Writer.java
>
>
> In the append design doc 
> (https://issues.apache.org/jira/secure/attachment/12370562/Appends.doc), it 
> says
> * A reader is guaranteed to be able to read data that was 'flushed' before 
> the reader opened the file
> However, this feature is not yet implemented.  Note that the operation 
> 'flushed' is now called "sync".

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.

[jira] Commented: (HADOOP-4379) In HDFS, sync() not yet guarantees data available to the new readers

Reply via email to