dclim commented on issue #8038: Making optimal usage of multiple segment cache 
locations
URL: https://github.com/apache/incubator-druid/pull/8038#issuecomment-509948508
 
 
   Ah interesting - I thought I remembered the behavior used to select the 
least filled disk! Looks like a regression at some point.
   
   @sashidhar I do still think there's value in making the selector strategy 
configurable to something like round-robin for the reason you mentioned. An 
example - I was setting up a Druid cluster that had two volumes mounted (let's 
say they were each 10G and called /mnt and /mnt1). I was also using /mnt for 
other stuff - as a general scratch drive, storing intermediate indexing files, 
log files, etc. so I needed to reserve some space for this - let's say I 
reserved 2G. I had 8G left, so I set the size of the segment cache for /mnt to 
8G.
   
   Now, what do I set the size of the segment cache for /mnt1 to? If I set it 
to 10G to fully utilize the volume and at a point in time have less than 2G of 
data, it would all be on /mnt1 and potentially wouldn't be maximizing the I/O 
throughput available. I could instead set it to 8G to be the same as /mnt and 
that would evenly distribute the segments, but I'd lose those 2G unnecessarily 
just to coax the algorithm to utilize both locations.
   
   A round-robin strategy (or one that selects the location that has the least 
bytes used in absolute terms instead of relative to the capacity) would have 
been what I wanted.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to