[
https://issues.apache.org/jira/browse/HADOOP-18028?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17713065#comment-17713065
]
ASF GitHub Bot commented on HADOOP-18028:
-----------------------------------------
ahmarsuhail commented on PR #5559:
URL: https://github.com/apache/hadoop/pull/5559#issuecomment-1511252967
looks good so far, not sure if this helpful, but patches that came after
this big commit are (listed in order they were committed to trunk):
- ITestS3ACannedACLs failure; not in a span:
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18385),
[PR](https://github.com/apache/hadoop/pull/4736)
- fs.s3a.prefetch.block.size to be read through longBytesOption:
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18380), [PR
](https://github.com/apache/hadoop/pull/4762)
- s3a prefetching to use SemaphoredDelegatingExecutor for submitting work:
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18186),
[PR](https://github.com/apache/hadoop/pull/4796)
- hadoop-aws maven build to add a prefetch profile to run all tests with
prefetching: [JIRA](https://issues.apache.org/jira/browse/HADOOP-18377),
[PR](https://github.com/apache/hadoop/pull/4914)
- s3a prefetching Executor should be closed:
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18455),
[PR](https://github.com/apache/hadoop/pull/4879) &
[PR](https://github.com/apache/hadoop/pull/4926)
- Implement readFully(long position, byte[] buffer, int offset, int length)
- [JIRA](https://issues.apache.org/jira/browse/HADOOP-18378),
[PR](https://github.com/apache/hadoop/pull/4955)
- S3PrefetchingInputStream to support status probes when closed -
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18189),
[PR](https://github.com/apache/hadoop/pull/5036)
- assertion failure in ITestS3APrefetchingInputStream -
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18531),
[PR](https://github.com/apache/hadoop/pull/5149)
- Remove lower limit on s3a prefetching/caching block size -
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18246),
[PR](https://github.com/apache/hadoop/pull/5120)
- S3A prefetching: Error logging during reads -
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18351),[
PR](https://github.com/apache/hadoop/pull/5274)
Patch available, but not merged yet:
SingleFilePerBlockCache to use LocalDirAllocator for file allocation:
[JIRA](https://issues.apache.org/jira/browse/HADOOP-18399),
[PR](https://github.com/apache/hadoop/pull/5054)
> High performance S3A input stream with prefetching & caching
> ------------------------------------------------------------
>
> Key: HADOOP-18028
> URL: https://issues.apache.org/jira/browse/HADOOP-18028
> Project: Hadoop Common
> Issue Type: Improvement
> Components: fs/s3
> Reporter: Bhalchandra Pandit
> Assignee: Bhalchandra Pandit
> Priority: Major
> Labels: pull-request-available
> Time Spent: 14.5h
> Remaining Estimate: 0h
>
> I work for Pinterest. I developed a technique for vastly improving read
> throughput when reading from the S3 file system. It not only helps the
> sequential read case (like reading a SequenceFile) but also significantly
> improves read throughput of a random access case (like reading Parquet). This
> technique has been very useful in significantly improving efficiency of the
> data processing jobs at Pinterest.
>
> I would like to contribute that feature to Apache Hadoop. More details on
> this technique are available in this blog I wrote recently:
> [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0]
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]