clintropolis opened a new pull request, #18683: URL: https://github.com/apache/druid/pull/18683
### Description This PR makes some behavioral changes to how historical servers manage their cache files when operating in the 'virtual storage' mode added in #18176, to provide a better experience and more flexibility to callers. The main change is that in virtual storage mode, it is now impossible for a segment to be 'missing' in the `SegmentManager` if a caller has a `DataSegment`. By 'missing' segments, I am referring to the specific Druid response a historical server provides in the query response context to cover the case of a segment drop command occurring before the query engine has a chance to obtain a reference to the files (or even before ever receiving the request), so that the broker can retry these requests to the new location the segment has been moved to (if retries are enabled the the segment is available elsewhere). Now, in virtual storage mode, the historical will simply download the segment on demand and process it for any `DataSegment` object. So for example, when using `ServerManager` to get segments, this means that after resolving the set of `DataSegments` participating in a query from the timeline, the query engine can always obtain a `SegmentReference` so long as there is adequate disk space to download (and it is present in deep storage of course). To complement this behavior, the segment cache itself no longer actively unmounts weakly held segment cache entries, making the coordinator drop command a near no-op. Instead, files will remain on disk until they are evicted because some other segment needs to reclaim the space so that it can be loaded. This does mean that the disk will 'fill up' over time, but is worth the benefits. To accommodate this and ensure that segment files are tracked across process boundaries (in the event of a failure or whatever), segment info files are now deleted via an 'unmount' hook on the segment files cache entry, meaning that they will not be removed until the segment files are also removed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
