[ 
https://issues.apache.org/jira/browse/HADOOP-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293487#comment-14293487
 ] 

Steve Loughran commented on HADOOP-11417:
-----------------------------------------

Looking at the HDFS code, the logic is
{code}
    if (targetPos > getFileLength()) {
      throw new EOFException("Cannot seek after EOF");
    }
    if (targetPos < 0) {
      throw new EOFException("Cannot seek to negative offset");
    }
    if (closed) {
      throw new IOException("Stream is closed!");
    }
{code}

That is: it is not an error to {{seek(len(file))}.

Instead, on the {{read()}} operation, it goes
{code}
    if (pos < getFileLength()) {
     ... the read logic, which appears to either return success or throw 
something
    }
   return -1;
{code}

That is: you can seek to the length of a file; the read() operation then fails.

h3. Conclusions

The FS spec is wrong as it says filesystems MAY throw an exception for any seek 
>= len(file). It should 

{code}
    s > 0 and ((s==0) or ((s < len(data)))) else raise [EOFException, 
IOException]

Some FileSystems do not raise an exception if this condition is not met. They
instead return -1 on any `read()` operation where, at the time of the read,
`len(data(FSDIS)) < pos(FSDIS)`.
{code}

it should have the condition
{code}
    s >= 0 and s < len(data) else raise [EOFException, IOException]
{code}

This matches hdfs and handles what was considered the special case, seek(0) is 
always valid.

As HADOOP-11270 notes, at least one of the object stores does not follow HDFS 
behaviour. Apart from a special test for seek(0). {{AbstractContractSeekTest}} 
does not test the case {{seek(len(file))}}. It does test {{seek(len(file))+2}}, 
going far enough past the end to resolve any ambiguity.

Proposed

# correct the spec to match HDFS
# add a new test in {{AbstractContractSeekTest}} which declares that all 
filesystem clients must support  {{seek(len(file))}}. 
# see what fails. 
# fix them.





> review filesystem seek logic, clarify/confirm spec, test & fix compliance
> -------------------------------------------------------------------------
>
>                 Key: HADOOP-11417
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11417
>             Project: Hadoop Common
>          Issue Type: Task
>          Components: fs, fs/s3, fs/swift
>    Affects Versions: 2.6.0
>            Reporter: Steve Loughran
>            Assignee: Steve Loughran
>
> HADOOP-11270 implies there's a diff in the way HDFS seeks and the object 
> stores on the action {{seek(len(file))}}
> # review what HDFS does, add contract test to exactly demonstrate HDFS 
> behaviour.
> # ensure FS spec is consistent with this
> # test/audit all supported filesystems to verify consistent behaviour
> # fix where appropriate



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to