[
https://issues.apache.org/jira/browse/HADOOP-18852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755384#comment-17755384
]
Viraj Jasani commented on HADOOP-18852:
---------------------------------------
{quote}also, if a read is in a block other than the current one, but which is
already being fetched or cached, is this really an OOO read to the extent that
outstanding fetches should be cancelled?
{quote}
+1 to this, now that i checked some logs, can see lazy-seek for every first
seek + read on the given block:
{code:java}
DEBUG prefetch.S3ACachingInputStream
(S3ACachingInputStream.java:ensureCurrentBuffer(141)) - lazy-seek(0:0)
DEBUG prefetch.S3ACachingInputStream
(S3ACachingInputStream.java:ensureCurrentBuffer(141)) - lazy-seek(4:40960)
DEBUG prefetch.S3ACachingInputStream
(S3ACachingInputStream.java:ensureCurrentBuffer(141)) - lazy-seek(3:30720)
DEBUG prefetch.S3ACachingInputStream
(S3ACachingInputStream.java:ensureCurrentBuffer(141)) - lazy-seek(2:20480){code}
but it's also valid that if the block was being cached, why cancel the
outstanding fetches.
> S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look
> like random IO
> ------------------------------------------------------------------------------------------
>
> Key: HADOOP-18852
> URL: https://issues.apache.org/jira/browse/HADOOP-18852
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: fs/s3
> Affects Versions: 3.3.6
> Reporter: Steve Loughran
> Priority: Major
>
> noticed in HADOOP-18184, but I think it's a big enough issue to be dealt with
> separately.
> # all seeks are lazy; no fetching is kicked off after an open
> # the first read is treated as an out of order read, so cancels any active
> reads (don't think there are any) and then only asks for 1 block
> {code}
> if (outOfOrderRead) {
> LOG.debug("lazy-seek({})", getOffsetStr(readPos));
> blockManager.cancelPrefetches();
> // We prefetch only 1 block immediately after a seek operation.
> prefetchCount = 1;
> }
> {code}
> * for any read fully we should prefetch all blocks in the range requested
> * for other reads, we may want a bigger prefech count than 1, depending on:
> split start/end, file read policy (random, sequential, whole-file)
> * also, if a read is in a block other than the current one, but which is
> already being fetched or cached, is this really an OOO read to the extent
> that outstanding fetches should be cancelled?
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]