[ 
https://issues.apache.org/jira/browse/HADOOP-18177?focusedWorklogId=761912&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-761912
 ]

ASF GitHub Bot logged work on HADOOP-18177:
-------------------------------------------

                Author: ASF GitHub Bot
            Created on: 25/Apr/22 17:00
            Start Date: 25/Apr/22 17:00
    Worklog Time Spent: 10m 
      Work Description: ahmarsuhail commented on code in PR #4205:
URL: https://github.com/apache/hadoop/pull/4205#discussion_r857842762


##########
hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md:
##########
@@ -0,0 +1,151 @@
+<!---
+  Licensed under the Apache License, Version 2.0 (the "License");
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License. See accompanying LICENSE file.
+-->
+
+# S3A Prefetching
+
+
+This document explains the `S3PrefetchingInputStream` and the various 
components it uses.
+
+This input stream implements prefetching and caching to improve read 
performance of the input stream. A high level overview of this feature can also 
be found on 
[this](https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0)
 blogpost.
+
+With prefetching, we divide the file into blocks of a fixed size (default is 
8MB), associate buffers to these blocks, and then read data into these buffers 
asynchronously. We also potentially cache these blocks.

Review Comment:
   good point, have removed 





Issue Time Tracking
-------------------

    Worklog Id:     (was: 761912)
    Time Spent: 1h 10m  (was: 1h)

> document use and architecture design of prefetching s3a input stream
> --------------------------------------------------------------------
>
>                 Key: HADOOP-18177
>                 URL: https://issues.apache.org/jira/browse/HADOOP-18177
>             Project: Hadoop Common
>          Issue Type: Sub-task
>          Components: documentation, fs/s3
>    Affects Versions: 3.4.0
>            Reporter: Steve Loughran
>            Assignee: Ahmar Suhail
>            Priority: Major
>              Labels: pull-request-available
>          Time Spent: 1h 10m
>  Remaining Estimate: 0h
>
> Document S3PrefetchingInputStream for users  (including any new failure modes 
> in troubleshooting) and the architecture for maintainers
> there's some markdown in 
> hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/read/README.md 
> already



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to