pjain1 opened a new issue #11297: URL: https://github.com/apache/druid/issues/11297
Using deep storage as intermediate store for shuffle tasks ### Description If autoscaling for MM is enabled then MM which generated the intermediate index might not be available as it may have been scaled down. So it is a good idea to have an option to use deep storage for intermediate data. ### Changes #### For pushing partial segments `ShuffleDataSegmentPusher` uses `IntermediaryDataManager`. It can be converted to an interface having methods - 1. `long addSegment(String supervisorTaskId, String subTaskId, DataSegment segment, URI segmentLocation)` 2. `Optional<ByteSource> findPartitionFile(String supervisorTaskId, String subTaskId, Interval interval, int bucketId)` 3. `void deletePartitions(String supervisorTaskId)` Default implementation of `IntermediaryDataManager` can be `LocalIntermediaryDataManager` which manages partial segments locally on MM. Optional implementation can be added via extensions to support different deep storages or other places. #### For pulling partial segments `ShuffleClient` is already interfaced having default implementation of `HttpShuffleClient`, so just need to implement ones for other storage. Interface method need to be changed to `File fetchSegmentFile(URI partitionDir, String supervisorTaskId, P location)`. Might need to check if different implementation of `PartitionLocation` is also needed. ### Motivation To make shuffle work with MM auto scaling. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
