[
https://issues.apache.org/jira/browse/HADOOP-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098266#comment-17098266
]
Mikhail Pryakhin commented on HADOOP-9713:
------------------------------------------
Another option is to defer a seek call until the next `FsDataInputstream.
read(long position, byte[] buffer, int offset, int length)` invocation, making
it lazy. Normally the subsequent reads will proceed reading from the position
where the previous read finished, meaning we can avoid making seek operations
in this case. We will only need to seek when the current
`FsDataInputstream.getPos() != requested position`. In standard read scenario,
this will drastically reduce the number of seeks.
> FSDataInputStream.readFully doesn't work on filesystems without seek -even
> when the offset==getPos
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-9713
> URL: https://issues.apache.org/jira/browse/HADOOP-9713
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.1.0-beta, 1.3.0, 3.0.0-alpha1
> Reporter: Steve Loughran
> Priority: Minor
>
> {{FSDataInputStream.readFully(offset,data)}} doesn't work even if the
> offset==the current location -because it always seeks to the offset and seeks
> back. No seek => Exception.
> We could optimise {{FSDataInputStream.readFully(offset,data)}} to eliminate
> the seeks on these operations -which would have tangible benefits for those
> filesystems where seek is expensive (remote blobstores). It would also let
> you use readFully against filesystems without seeks, provided you are only
> reading from the current location.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]