sashidhar edited a comment on issue #7641: Suboptimal usage of multiple segment cache locations URL: https://github.com/apache/incubator-druid/issues/7641#issuecomment-507197077 @dclim 1. I want to understand if the proposed algorithm has any side effects like data locality. With the current algorithm all the segments of a particular interval might land up in the same location (correct me). With the new algorithm the segments of the same interval will be distributed among multiple locations. Does this have any effect on the query performance ? 2. `SegmentLoaderLocalCacheManager.loadSegmentWithRetry(..)` ``` private StorageLocation loadSegmentWithRetry(DataSegment segment, String storageDirStr) throws SegmentLoadingException { for (StorageLocation loc : **locations**) { if (loc.canHandle(segment)) { File storageDir = new File(loc.getPath(), storageDirStr); try { loadInLocationWithStartMarker(segment, storageDir); return loc; } catch (SegmentLoadingException e) { .... } } } throw new SegmentLoadingException("Failed to load segment %s in all locations.", segment.getId()); } ``` Whenever loadSegmentWithRetry() is called, the for loop above picks the same location (first from the list initially) until the location's capacity is exhausted. Once a location's capacity is exhausted it picks another. The implementation of the proposed algorithm will look something like below, ``` .... private StorageLocation loadSegmentWithRetry(DataSegment segment, String storageDirStr) throws SegmentLoadingException { StorageLocation nextLocation = **getNextLocation();** if (nextLocation.canHandle(segment)) { // load segment return nextLocation; } ... ... } private StorageLocation getNextLocation() { **// Loop through the **locations** in a round robin fashion.** return nextLocation; } ```
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
