[
https://issues.apache.org/jira/browse/HADOOP-9371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13635220#comment-13635220
]
Steve Loughran commented on HADOOP-9371:
----------------------------------------
We also need to specify {{Seekable}}, as the {{FSDataInputStream}} which must
be returned from {{open()}} calls implement it, and the specifics of
{{seek(long pos)}} are not completely defined, consistently implemented, or
explicitly tested.
* some implementation classes validate the range of a seek in the call; it can
also be postponed until the next read() (which is how Posix expects it).
* Not everything rejects negative seek offsets
* While {{EOFException}} would be the appropriate exception to raise on going
past the end of the file, it is rarely to be seen in the source.
Delayed seeks can deliver tangible performance benefits and it would be unwise
to demand stricter validation than {{::lseek()}} or {{::SetFilePointerEx()}}.
We ought to say "you can if you want", and write tests that verify either the
seek fails, or the read straight afterwards fails.
== Seekable ==
* When a file is opened, {{getPos()}} MUST equal 0
* Implementations MAY NOT implement {{seek()}}, and instead MAY throw an
{{IOException}}
* A {{seek(L)}} on a closed input stream MUST fail with an {{IOException}}.
* After a successful {{seek(L)}}, {{getPos()==L}} for all L: {{0 =< L <
length(file)}}
* On a {{seek(L)}} with L<0 an MUST be thrown. It SHOULD be an {{IOException}}.
It MAY be {{IllegalArgumentException}} or other {{RuntimeException}}
* On a {{seek(L)}} with L>length(file), an {{IOException}} MAY be thrown. It
SHOULD be an {{EndOfFileException}}
* If an {{IOException}} is not thrown, then an {{IOException}} MUST be thrown
on the next {{read()}} operation. It SHOULD be an {{EndOfFileException}}
This is actually a relaxation of the {{Seekable.seek()}} definition, which
states "Can't seek past the end of the file.". The {{RawLocalFileSystem}} on
which everything ultimately depends does support seeking past the end of the
file -it is only on the read operation where an exception is raised.
* After a {{seek(L)}} with {{L<length(file)}}, {{read()}} returns the byte at
position L in the file.
* After a {{seek(L)}} with {{L==length(file)}}, {{read()}} returns -1
* After a {{seek(L)}} with {{L==length(file)}}, {{read(byte[1],0,1)}} returns
the byte at position L in the file.
Tests to verify offset validation
# open a file of length {{file_len > 0}}, verify {{getPos()==0}}
# {{seek(file_len)}}, verify {{getPos()==file_len}}
If an exception is not raised, read() and expect an {{IOException}} exception
# {{seek(file_len+1)}}, expect an {{EOFException}}
If an exception is not raised, read() and expect the exception then
# seek(-1), expect an {{IOException}} immediately.
open a file of length {{file_len == 0}}
# verify {{getPos()==0}}
# Verify that {{seek(0)}} succeeds.
# verify that {{read()}} returns -1.
Test to verify {{seek()}} actually changes the location for future reads.
* verify that after a {{seek()}}, {{read()}} returns the data at the seek
location. This must work for forward and backwards seeks.
* verify that after a {{seek()}}, a {{read(byte[])}} returns the bytes of data
at the seek location. This must work for forward and backwards seeks.]
Repeat for very large offsets (e.g. 128KB file), to ensure that filesystems
with local caches/buffers handle longer range seeks correctly.
> Define Semantics of FileSystem and FileContext more rigorously
> --------------------------------------------------------------
>
> Key: HADOOP-9371
> URL: https://issues.apache.org/jira/browse/HADOOP-9371
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs
> Affects Versions: 1.2.0, 3.0.0
> Reporter: Steve Loughran
> Assignee: Steve Loughran
> Attachments: HADOOP-9361.2.patch, HADOOP-9361.patch,
> HadoopFilesystemContract.pdf
>
> Original Estimate: 48h
> Remaining Estimate: 48h
>
> The semantics of {{FileSystem}} and {{FileContext}} are not completely
> defined in terms of
> # core expectations of a filesystem
> # consistency requirements.
> # concurrency requirements.
> # minimum scale limits
> Furthermore, methods are not defined strictly enough in terms of their
> outcomes and failure modes.
> The requirements and method semantics should be defined more strictly.
--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira