[
https://issues.apache.org/jira/browse/HADOOP-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12764444#action_12764444
]
dhruba borthakur commented on HADOOP-6307:
------------------------------------------
Isn't it true that fs.getFileStatus(file).getLen(). requires read access on the
parent directory whereas fs.open(file).available() required read access on the
file itself?
Many map-reduce programs use SequenceFiles to store data. And they do not need
the facility to process files that are currently being written to. In this
case, isn't the additional overhead of fetching block locations via
fs.open(file) kinda wasteful?
> Support reading on un-closed SequenceFile
> -----------------------------------------
>
> Key: HADOOP-6307
> URL: https://issues.apache.org/jira/browse/HADOOP-6307
> Project: Hadoop Common
> Issue Type: Improvement
> Components: io
> Reporter: Tsz Wo (Nicholas), SZE
>
> When a SequenceFile.Reader is constructed, it calls
> fs.getFileStatus(file).getLen(). However, fs.getFileStatus(file).getLen()
> does not return the hflushed length for un-closed file since the Namenode
> does not know the hflushed length. DFSClient have to ask a datanode for the
> length last block which is being written; see also HDFS-570.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.