clintropolis opened a new pull request, #19560:
URL: https://github.com/apache/druid/pull/19560

   ### Description
   This PR implements `SegmentRangeReader` for S3 so V10 segments stored 
unzipped (after this PR) can be partially loaded via range read requests. This 
is achieved by setting a new `rangeable` boolean on `S3LoadSpec` at publish 
time, so this will only apply to segments written after this change. Any 
existing V10 segments will fall back to full download for S3 at least; I think 
this is ok since it is an experimental feature and off by default, else we 
would need to do some sort of ugly existence check in a place where a check 
should be cheap and non-blocking.
   
   changes:
   * adds new `S3SegmentRangeReader` that wraps `ServerSideEncryptingAmazonS3` 
+ bucket + key prefix and issues closed-range `GetObjectRequests` against 
`keyPrefix + filename`. Returned stream is wrapped in a `RetryingInputStream` 
with the `S3Utils.S3RETRY` predicate (the same retry policy 
`S3DataSegmentPuller` uses for full-segment downloads) so a transient 
mid-stream error reopens at the byte offset where it failed and resumes with a 
fresh range request for the remaining bytes, rather than restarting the whole 
read.
   * New `rangeable` boolean on `S3LoadSpec` stamped by the pusher at write 
time. `S3LoadSpec.openRangeReader()` returns a reader iff the flag is true and 
the key isn't .zip
   * `S3DataSegmentPusher.pushNoZip` stamps rangeable=true when binaryVersion 
is `V10_VERSION`, false otherwise. `pushZip` omits the field


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to