[jira] [Commented] (HADOOP-18852) S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look like random IO

Viraj Jasani (Jira) Wed, 16 Aug 2023 23:41:05 -0700


    [ 
https://issues.apache.org/jira/browse/HADOOP-18852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17755384#comment-17755384
 ]


Viraj Jasani commented on HADOOP-18852:
---------------------------------------

{quote}also, if a read is in a block other than the current one, but which is 
already being fetched or cached, is this really an OOO read to the extent that 
outstanding fetches should be cancelled?
{quote}
+1 to this, now that i checked some logs, can see lazy-seek for every first 
seek + read on the given block:
{code:java}
DEBUG prefetch.S3ACachingInputStream 
(S3ACachingInputStream.java:ensureCurrentBuffer(141)) - lazy-seek(0:0)
DEBUG prefetch.S3ACachingInputStream 
(S3ACachingInputStream.java:ensureCurrentBuffer(141)) - lazy-seek(4:40960)
DEBUG prefetch.S3ACachingInputStream 
(S3ACachingInputStream.java:ensureCurrentBuffer(141)) - lazy-seek(3:30720)
DEBUG prefetch.S3ACachingInputStream 
(S3ACachingInputStream.java:ensureCurrentBuffer(141)) - lazy-seek(2:20480){code}
but it's also valid that if the block was being cached, why cancel the 
outstanding fetches.

> S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look 
> like random IO
> ------------------------------------------------------------------------------------------
>
>                 Key: HADOOP-18852
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18852
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: fs/s3
>    Affects Versions: 3.3.6
>            Reporter: Steve Loughran
>            Priority: Major
>
> noticed in HADOOP-18184, but I think it's a big enough issue to be dealt with 
> separately.
> # all seeks are lazy; no fetching is kicked off after an open
> # the first read is treated as an out of order read, so cancels any active 
> reads (don't think there are any) and then only asks for 1 block
> {code}
>     if (outOfOrderRead) {
>       LOG.debug("lazy-seek({})", getOffsetStr(readPos));
>       blockManager.cancelPrefetches();
>       // We prefetch only 1 block immediately after a seek operation.
>       prefetchCount = 1;
>     }
> {code}
> * for any read fully we should prefetch all blocks in the range requested
> * for other reads, we may want a bigger prefech count than 1, depending on: 
> split start/end, file read policy (random, sequential, whole-file)
> * also, if a read is in a block other than the current one, but which is 
> already being fetched or cached, is this really an OOO read to the extent 
> that outstanding fetches should be cancelled?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Commented] (HADOOP-18852) S3ACachingInputStream.ensureCurrentBuffer(): lazy seek means all reads look like random IO

Reply via email to