[
https://issues.apache.org/jira/browse/HADOOP-18177?focusedWorklogId=761910&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-761910
]
ASF GitHub Bot logged work on HADOOP-18177:
-------------------------------------------
Author: ASF GitHub Bot
Created on: 25/Apr/22 17:00
Start Date: 25/Apr/22 17:00
Worklog Time Spent: 10m
Work Description: ahmarsuhail commented on code in PR #4205:
URL: https://github.com/apache/hadoop/pull/4205#discussion_r857842464
##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md:
##########
@@ -0,0 +1,151 @@
+<!---
+ Licensed under the Apache License, Version 2.0 (the "License");
+ you may not use this file except in compliance with the License.
+ You may obtain a copy of the License at
+
+ http://www.apache.org/licenses/LICENSE-2.0
+
+ Unless required by applicable law or agreed to in writing, software
+ distributed under the License is distributed on an "AS IS" BASIS,
+ WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ See the License for the specific language governing permissions and
+ limitations under the License. See accompanying LICENSE file.
+-->
+
+# S3A Prefetching
+
+
+This document explains the `S3PrefetchingInputStream` and the various
components it uses.
+
+This input stream implements prefetching and caching to improve read
performance of the input stream. A high level overview of this feature can also
be found on
[this](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0)
blogpost.
+
+With prefetching, we divide the file into blocks of a fixed size (default is
8MB), associate buffers to these blocks, and then read data into these buffers
asynchronously. We also potentially cache these blocks.
+
+### Basic Concepts
+
+* **File** : A binary blob of data stored on some storage device.
+* **Block :** A file is divided into a number of blocks. The default size of a
block is 8MB, but can be configured. The size of the first n-1 blocks is same,
and the size of the last block may be same or smaller.
+* **Block based reading** : The granularity of read is one block. That is, we
read an entire block and return or none at all. Multiple blocks may be read in
parallel.
+
+### Configuring the stream
+
+|Property |Meaning |Default |
+|--- |--- |--- |
+|fs.s3a.prefetch.enabled |Enable the prefetch input stream |TRUE |
+|fs.s3a.prefetch.block.size |Size of a block |8MB |
+|fs.s3a.prefetch.block.count |Number of blocks to prefetch |8 |
+
+### Key Components:
+
+`S3PrefetchingInputStream` - When prefetching is enabled, S3AFileSystem will
return an instance of this class as the input stream. Depending on the file
size, it will either use the `S3InMemoryInputStream` or the
`S3CachingInputStream` as the underlying input stream.
+
+`S3InMemoryInputStream` - Underlying input stream used when the file size <
configured block size. Will read the entire file into memory.
+
+`S3CachingInputStream` - Underlying input stream used when file size >
configured block size. Uses asynchronous prefetching of blocks and caching to
improve performance.
+
+`BlockData` - Holds information about the blocks in a file, such as:
+
+* Number of blocks in the file
+* Block size
+* State of each block (initially all blocks have state *NOT_READY*). Other
states are: Queued, Ready, Cached.
+
+`BufferData` - Holds the buffer and additional information about it such as:
+
+* The block number this buffer is for
+* State of the buffer (Unknown, Blank, Prefetching, Caching, Ready, Done).
Initial state of a buffer is blank.
+
+`CachingBlockManager` - Implements reading data into the buffer, prefetching
and caching.
+
+`BufferPool` - Manages a fixed sized pool of buffers. It’s used by
`CachingBlockManager` to acquire buffers.
+
+`S3File` - Implements operations to interact with S3 such as opening and
closing the input stream to the S3 file.
+
+`S3Reader` - Implements reading from the stream opened by `S3File`. Reads from
this input stream in blocks of 64KB.
+
+`FilePosition` - Provides functionality related to tracking the position in
the file. Also gives access to the current buffer in use.
+
+`SingleFilePerBlockCache` - Responsible for caching blocks to the local file
system. Each cache block is stored on the local disk as a separate file.
+
+### Operation
+
+### S3InMemoryInputStream
+
+If we have a file with size 5MB, and block size = 8MB. Since file size is less
than the block size, the `S3InMemoryInputStream` will be used.
+
+If the caller makes the following read calls:
+
+
+```
+in.read(buffer, 0, 3MB);
+in.read(buffer, 0, 2MB);
+```
+
+When the first read is issued, there is no buffer in use yet. We get the data
in this file by calling the `ensureCurrentBuffer()` method, which ensures that
a buffer with data is available to be read from.
+
+The `ensureCurrentBuffer()` then:
+
+* Reads data into a buffer by calling `S3Reader.read(ByteBuffer buffer, long
offset, int size)`
+* `S3Reader` uses `S3File` to open an input stream to the S3 file by making a
`getObject()` request with range as `(0, filesize)`.
+* The S3Reader reads the entire file into the provided buffer, and once
reading is complete closes the S3 stream and frees all underlying resources.
+* Now the entire file is in a buffer, set this data in `FilePosition` so it
can be accessed by the input stream.
+
+The read operation now just gets the required bytes from the buffer in
`FilePosition`.
+
+When the second read is issued, there is already a valid buffer which can be
used. Don’t do anything else, just read the required bytes from this buffer.
+
+### S3CachingInputStream
+
+
+
+[Image: image.png]
Review Comment:
sorry, I didn't mean to commit this. I was initially using the image in the
blogpost:
https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0,
but that diagram is very high level so I'm not sure if it adds much here. Do
we think having that will help? Or do we think we need a lower level
architecture diagram?
Issue Time Tracking
-------------------
Worklog Id: (was: 761910)
Time Spent: 1h (was: 50m)
> document use and architecture design of prefetching s3a input stream
> --------------------------------------------------------------------
>
> Key: HADOOP-18177
> URL: https://issues.apache.org/jira/browse/HADOOP-18177
> Project: Hadoop Common
> Issue Type: Sub-task
> Components: documentation, fs/s3
> Affects Versions: 3.4.0
> Reporter: Steve Loughran
> Assignee: Ahmar Suhail
> Priority: Major
> Labels: pull-request-available
> Time Spent: 1h
> Remaining Estimate: 0h
>
> Document S3PrefetchingInputStream for users (including any new failure modes
> in troubleshooting) and the architecture for maintainers
> there's some markdown in
> hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/README.md
> already
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]