a2l007 opened a new pull request #11899: URL: https://github.com/apache/druid/pull/11899
While fetching segments from S3, it presently creates an object summary(LIST operation) for the segment before proceeding to GET the object and so the number of LIST ops are proportional to the number of segments. Since LIST ops are more expensive compared to GET, it is desirable to reduce the number of list ops especially if the LIST limit Is much smaller than for GETs. This PR lazily creates the object summary since it isn't really required for pulling segments since the bucket and prefix can be retrieved from the URI and the check to validate if the object is present in the bucket is already done before attempting to pull the segment. This reduces the list operations down to zero while pulling segments. <hr> This PR has: - [x] been self-reviewed. - [x] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [x] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
