[
https://issues.apache.org/jira/browse/HADOOP-11417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14293487#comment-14293487
]
Steve Loughran commented on HADOOP-11417:
-----------------------------------------
Looking at the HDFS code, the logic is
{code}
if (targetPos > getFileLength()) {
throw new EOFException("Cannot seek after EOF");
}
if (targetPos < 0) {
throw new EOFException("Cannot seek to negative offset");
}
if (closed) {
throw new IOException("Stream is closed!");
}
{code}
That is: it is not an error to {{seek(len(file))}.
Instead, on the {{read()}} operation, it goes
{code}
if (pos < getFileLength()) {
... the read logic, which appears to either return success or throw
something
}
return -1;
{code}
That is: you can seek to the length of a file; the read() operation then fails.
h3. Conclusions
The FS spec is wrong as it says filesystems MAY throw an exception for any seek
>= len(file). It should
{code}
s > 0 and ((s==0) or ((s < len(data)))) else raise [EOFException,
IOException]
Some FileSystems do not raise an exception if this condition is not met. They
instead return -1 on any `read()` operation where, at the time of the read,
`len(data(FSDIS)) < pos(FSDIS)`.
{code}
it should have the condition
{code}
s >= 0 and s < len(data) else raise [EOFException, IOException]
{code}
This matches hdfs and handles what was considered the special case, seek(0) is
always valid.
As HADOOP-11270 notes, at least one of the object stores does not follow HDFS
behaviour. Apart from a special test for seek(0). {{AbstractContractSeekTest}}
does not test the case {{seek(len(file))}}. It does test {{seek(len(file))+2}},
going far enough past the end to resolve any ambiguity.
Proposed
# correct the spec to match HDFS
# add a new test in {{AbstractContractSeekTest}} which declares that all
filesystem clients must support {{seek(len(file))}}.
# see what fails.
# fix them.
> review filesystem seek logic, clarify/confirm spec, test & fix compliance
> -------------------------------------------------------------------------
>
> Key: HADOOP-11417
> URL: https://issues.apache.org/jira/browse/HADOOP-11417
> Project: Hadoop Common
> Issue Type: Task
> Components: fs, fs/s3, fs/swift
> Affects Versions: 2.6.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
>
> HADOOP-11270 implies there's a diff in the way HDFS seeks and the object
> stores on the action {{seek(len(file))}}
> # review what HDFS does, add contract test to exactly demonstrate HDFS
> behaviour.
> # ensure FS spec is consistent with this
> # test/audit all supported filesystems to verify consistent behaviour
> # fix where appropriate
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)