clintropolis opened a new pull request, #19560: URL: https://github.com/apache/druid/pull/19560
### Description This PR implements `SegmentRangeReader` for S3 so V10 segments stored unzipped (after this PR) can be partially loaded via range read requests. This is achieved by setting a new `rangeable` boolean on `S3LoadSpec` at publish time, so this will only apply to segments written after this change. Any existing V10 segments will fall back to full download for S3 at least; I think this is ok since it is an experimental feature and off by default, else we would need to do some sort of ugly existence check in a place where a check should be cheap and non-blocking. changes: * adds new `S3SegmentRangeReader` that wraps `ServerSideEncryptingAmazonS3` + bucket + key prefix and issues closed-range `GetObjectRequests` against `keyPrefix + filename`. Returned stream is wrapped in a `RetryingInputStream` with the `S3Utils.S3RETRY` predicate (the same retry policy `S3DataSegmentPuller` uses for full-segment downloads) so a transient mid-stream error reopens at the byte offset where it failed and resumes with a fresh range request for the remaining bytes, rather than restarting the whole read. * New `rangeable` boolean on `S3LoadSpec` stamped by the pusher at write time. `S3LoadSpec.openRangeReader()` returns a reader iff the flag is true and the key isn't .zip * `S3DataSegmentPusher.pushNoZip` stamps rangeable=true when binaryVersion is `V10_VERSION`, false otherwise. `pushZip` omits the field -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
