[
https://issues.apache.org/jira/browse/HADOOP-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098266#comment-17098266
]
Mikhail Pryakhin edited comment on HADOOP-9713 at 5/3/20, 3:40 PM:
-------------------------------------------------------------------
Another option is to defer a seek call until the next:
{code:java}
FsDataInputstream#read(long position, byte[] buffer, int offset, int
length){code}
invocation, making it lazy. Normally the subsequent reads will proceed reading
from the position where the previous read finished, meaning we can avoid making
seek operations in this case. We will only need to seek when the current
FsDataInputstream#getPos() != requested position. In standard read scenario,
this will drastically reduce the number of seeks.
was (Author: m.pryahin):
Another option is to defer a seek call until the next
{code:java}
FsDataInputstream#read(long position, byte[] buffer, int offset, int
length){code}
invocation, making it lazy. Normally the subsequent reads will proceed reading
from the position where the previous read finished, meaning we can avoid making
seek operations in this case. We will only need to seek when the current
FsDataInputstream#getPos() != requested position. In standard read scenario,
this will drastically reduce the number of seeks.
> FSDataInputStream.readFully doesn't work on filesystems without seek -even
> when the offset==getPos
> --------------------------------------------------------------------------------------------------
>
> Key: HADOOP-9713
> URL: https://issues.apache.org/jira/browse/HADOOP-9713
> Project: Hadoop Common
> Issue Type: Bug
> Components: fs
> Affects Versions: 2.1.0-beta, 1.3.0, 3.0.0-alpha1
> Reporter: Steve Loughran
> Assignee: Mikhail Pryakhin
> Priority: Minor
>
> {{FSDataInputStream.readFully(offset,data)}} doesn't work even if the
> offset==the current location -because it always seeks to the offset and seeks
> back. No seek => Exception.
> We could optimise {{FSDataInputStream.readFully(offset,data)}} to eliminate
> the seeks on these operations -which would have tangible benefits for those
> filesystems where seek is expensive (remote blobstores). It would also let
> you use readFully against filesystems without seeks, provided you are only
> reading from the current location.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]