[jira] [Commented] (HADOOP-18852) S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look like random IO

Steve Loughran (Jira) Tue, 22 Aug 2023 02:40:04 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757319#comment-17757319
 ]


Steve Loughran commented on HADOOP-18852:
-----------------------------------------

my unbuffer pr will pass down some of this, and the split start/end. we 
shouldn't bother prefetching past the end of a file split, should we?

> S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look 
> like random IO
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18852
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18852
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.6
>            Reporter: Steve Loughran
>            Priority: Major
>
> noticed in HADOOP-18184, but I think it's a big enough issue to be dealt with 
> separately.
> # all seeks are lazy; no fetching is kicked off after an open
> # the first read is treated as an out of order read, so cancels any active 
> reads (don't think there are any) and then only asks for 1 block
> {code}
>     if (outOfOrderRead) {
>       LOG.debug("lazy-seek({})", getOffsetStr(readPos));
>       blockManager.cancelPrefetches();
>       // We prefetch only 1 block immediately after a seek operation.
>       prefetchCount = 1;
>     }
> {code}
> * for any read fully we should prefetch all blocks in the range requested
> * for other reads, we may want a bigger prefech count than 1, depending on: 
> split start/end, file read policy (random, sequential, whole-file)
> * also, if a read is in a block other than the current one, but which is 
> already being fetched or cached, is this really an OOO read to the extent 
> that outstanding fetches should be cancelled?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18852) S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look like random IO

Reply via email to