[ https://issues.apache.org/jira/browse/HADOOP-18028?focusedWorklogId=747970&page=com.atlassian.jira.plugin.system.issuetabpanels:worklog-tabpanel#worklog-747970 ]
ASF GitHub Bot logged work on HADOOP-18028: ------------------------------------------- Author: ASF GitHub Bot Created on: 25/Mar/22 20:16 Start Date: 25/Mar/22 20:16 Worklog Time Spent: 10m Work Description: steveloughran opened a new pull request #4109: URL: https://github.com/apache/hadoop/pull/4109 ### Description of PR This is the PR of #3736 applied to a dedicated feature branch, with some minor pre-merge tuning with all subsequent changes to be their own PR * rename test classes, have AbstractHadoopTestBase as the base * package info files for new packages * import ordering * move to intercept() for assertions; ExceptionAsserts is invoking it and can be removed in future. this adds a dependency on a twitter lib which looks like scala code. that MUST be cut before we can merge. ### How was this patch tested? s3 london with `-Dparallel-tests -DtestsThreadCount=8 -Dmarkers=keep -Dscale` there are some failing integration tests which will need to be fixed before the feature branch is merged to trunk. ### For code changes: - [X] Does the title or this PR starts with the corresponding JIRA issue id (e.g. 'HADOOP-17799. Your PR title ...')? - [X] Object storage: have the integration tests been executed and the endpoint declared according to the connector-specific documentation? - [ ] If adding new dependencies to the code, are these dependencies licensed in a way that is compatible for inclusion under [ASF 2.0](http://www.apache.org/legal/resolved.html#category-a)? - [ ] If applicable, have you updated the `LICENSE`, `LICENSE-binary`, `NOTICE-binary` files? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org Issue Time Tracking ------------------- Worklog Id: (was: 747970) Time Spent: 12h (was: 11h 50m) > improve S3 read speed using prefetching & caching > ------------------------------------------------- > > Key: HADOOP-18028 > URL: https://issues.apache.org/jira/browse/HADOOP-18028 > Project: Hadoop Common > Issue Type: Improvement > Components: fs/s3 > Reporter: Bhalchandra Pandit > Priority: Major > Labels: pull-request-available > Time Spent: 12h > Remaining Estimate: 0h > > I work for Pinterest. I developed a technique for vastly improving read > throughput when reading from the S3 file system. It not only helps the > sequential read case (like reading a SequenceFile) but also significantly > improves read throughput of a random access case (like reading Parquet). This > technique has been very useful in significantly improving efficiency of the > data processing jobs at Pinterest. > > I would like to contribute that feature to Apache Hadoop. More details on > this technique are available in this blog I wrote recently: > [https://medium.com/pinterest-engineering/improving-efficiency-and-reducing-runtime-using-s3-read-optimization-b31da4b60fa0] > -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: common-issues-unsubscr...@hadoop.apache.org For additional commands, e-mail: common-issues-h...@hadoop.apache.org