[jira] [Commented] (HADOOP-9713) FSDataInputStream.readFully doesn't work on filesystems without seek -even when the offset==getPos

Mikhail Pryakhin (Jira) Sat, 02 May 2020 23:47:18 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-9713?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17098266#comment-17098266
 ]


Mikhail Pryakhin commented on HADOOP-9713:
------------------------------------------

Another option is to defer a seek call until the next `FsDataInputstream.
read(long position, byte[] buffer, int offset, int length)` invocation, making 
it lazy. Normally the subsequent reads will proceed reading from the position 
where the previous read finished, meaning we can avoid making seek operations 
in this case. We will only need to seek when the current  
`FsDataInputstream.getPos() != requested position`. In standard read scenario, 
this will drastically reduce the number of seeks.

> FSDataInputStream.readFully doesn't work on filesystems without seek -even 
> when the offset==getPos
> --------------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-9713
>                 URL: https://issues.apache.org/jira/browse/HADOOP-9713
>             Project: Hadoop Common
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.1.0-beta, 1.3.0, 3.0.0-alpha1
>            Reporter: Steve Loughran
>            Priority: Minor
>
> {{FSDataInputStream.readFully(offset,data)}} doesn't work even if the 
> offset==the current location -because it always seeks to the offset and seeks 
> back. No seek => Exception.
> We could optimise {{FSDataInputStream.readFully(offset,data)}} to eliminate 
> the seeks on these operations -which would have tangible benefits for those 
> filesystems where seek is expensive (remote blobstores). It would also let 
> you use readFully against filesystems without seeks, provided you are only 
> reading from the current location.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-9713) FSDataInputStream.readFully doesn't work on filesystems without seek -even when the offset==getPos

Reply via email to