[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

Steve Loughran (JIRA) Thu, 13 Dec 2018 06:13:56 -0800


    [ 
https://issues.apache.org/jira/browse/HADOOP-11867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16720202#comment-16720202
 ]


Steve Loughran commented on HADOOP-11867:
-----------------------------------------

Owen -these all make sense one more thing to consider:

what happens to your pos() during and after any async read.

Proposed: all bets are off.

A key optimisation all object stores eventually do is lazy seek, because the 
spec for positionedRead() says "doesn't change the value of getPos()". I say 
here: it will be whatever we want, and if you want to do sequential read, seek 
to where you want to start from.

that allows readers just to leave the stream at wherever they last read 
something, and expose that value to the caller, if they choose to use it

This leads to the following constraints.
*if you mix vectored reads with seek and read, you have no guarantees about 
where the getPos() is, or that it will be unchanged. 
* if you execute a second vectored read while the first is still active, 
outcome is undefined (or they both return, but theres no ordering guarantees, 
linearization?)
* if you cancel one vectored read, other reads may/may not be aborted

if a file changes underneath

#. that change may or may not be detected, 
# if it is, and it results in an error: no guarantees about when that 
happens.HADOOP-15625 looks at detecting this on S3A BTW.
#. there are no guarantees if/when the change results in new data being read in
#. just because one read returns the new data, does not mean other outstanding 
reads will return new data too. (this is a get-out for eventually consistent 
filesystems: both swift and S3 can return stale data after a read of fresh 
stuff, if your next GET is routed to a server with old data). 

Essentially:  don't use this IO mechanism for reading changing files. You do, 
things may break, your problem.

Other fun topic

valid ranges? 
* offset or length < 0: fail fast (validate all requests in synchronous bit of 
the call)
* length > byte buffer capacity: fail fast
* I'm assuming there's no "read to EOF" option: you need to know the EOF first. 
* What if I do a read with a range past EOF: that ranged read fails with 
EOFException. Validation may be postponed until read (i/e not the synchronous 
part of the call)

Error handling and retries
* streams may retry on connection problems
* if one read fails, impact on other reads is undefined (if they are running on 
separate threads, they may continue/succeed etc). 
That is: failures on ranged reads may be isolated.

In the default impl. we'd have to consider what to do here? If one read was 
raising an exception (FileNotFound?) then the others may want to give up, or 
try again. Proposed: leave it to the underlying stream to decide what to do.



> FS API: Add a high-performance vectored Read to FSDataInputStream API
> ---------------------------------------------------------------------
>
>                 Key: HADOOP-11867
>                 URL: https://issues.apache.org/jira/browse/HADOOP-11867
>             Project: Hadoop Common
>          Issue Type: New Feature
>          Components: hdfs-client
>    Affects Versions: 3.0.0
>            Reporter: Gopal V
>            Assignee: Owen O'Malley
>            Priority: Major
>              Labels: performance
>
> The most significant way to read from a filesystem in an efficient way is to 
> let the FileSystem implementation handle the seek behaviour underneath the 
> API to be the most efficient as possible.
> A better approach to the seek problem is to provide a sequence of read 
> locations as part of a single call, while letting the system schedule/plan 
> the reads ahead of time.
> This is exceedingly useful for seek-heavy readers on HDFS, since this allows 
> for potentially optimizing away the seek-gaps within the FSDataInputStream 
> implementation.
> For seek+read systems with even more latency than locally-attached disks, 
> something like a {{readFully(long[] offsets, ByteBuffer[] chunks)}} would 
> take of the seeks internally while reading chunk.remaining() bytes into each 
> chunk (which may be {{slice()}}ed off a bigger buffer).
> The base implementation can stub in this as a sequence of seeks + read() into 
> ByteBuffers, without forcing each FS implementation to override this in any 
> way.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-11867) FS API: Add a high-performance vectored Read to FSDataInputStream API

Reply via email to