dclim commented on issue #8038: Making optimal usage of multiple segment cache locations URL: https://github.com/apache/incubator-druid/pull/8038#issuecomment-509948508 Ah interesting - I thought I remembered the behavior used to select the least filled disk! Looks like a regression at some point. @sashidhar I do still think there's value in making the selector strategy configurable to something like round-robin for the reason you mentioned. An example - I was setting up a Druid cluster that had two volumes mounted (let's say they were each 10G and called /mnt and /mnt1). I was also using /mnt for other stuff - as a general scratch drive, storing intermediate indexing files, log files, etc. so I needed to reserve some space for this - let's say I reserved 2G. I had 8G left, so I set the size of the segment cache for /mnt to 8G. Now, what do I set the size of the segment cache for /mnt1 to? If I set it to 10G to fully utilize the volume and at a point in time have less than 2G of data, it would all be on /mnt1 and potentially wouldn't be maximizing the I/O throughput available. I could instead set it to 8G to be the same as /mnt and that would evenly distribute the segments, but I'd lose those 2G unnecessarily just to coax the algorithm to utilize both locations. A round-robin strategy (or one that selects the location that has the least bytes used in absolute terms instead of relative to the capacity) would have been what I wanted.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
