[
https://issues.apache.org/jira/browse/HADOOP-18852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17757319#comment-17757319
]
Steve Loughran commented on HADOOP-18852:
-----------------------------------------
my unbuffer pr will pass down some of this, and the split start/end. we
shouldn't bother prefetching past the end of a file split, should we?
> S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look
> like random IO
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-18852
> URL: https://issues.apache.org/jira/browse/HADOOP-18852
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.6
> Reporter: Steve Loughran
> Priority: Major
>
> noticed in HADOOP-18184, but I think it's a big enough issue to be dealt with
> separately.
> # all seeks are lazy; no fetching is kicked off after an open
> # the first read is treated as an out of order read, so cancels any active
> reads (don't think there are any) and then only asks for 1 block
> {code}
> if (outOfOrderRead) {
> LOG.debug("lazy-seek({})", getOffsetStr(readPos));
> blockManager.cancelPrefetches();
> // We prefetch only 1 block immediately after a seek operation.
> prefetchCount = 1;
> }
> {code}
> * for any read fully we should prefetch all blocks in the range requested
> * for other reads, we may want a bigger prefech count than 1, depending on:
> split start/end, file read policy (random, sequential, whole-file)
> * also, if a read is in a block other than the current one, but which is
> already being fetched or cached, is this really an OOO read to the extent
> that outstanding fetches should be cancelled?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]