jihoonson commented on a change in pull request #8038: Making optimal usage of 
multiple segment cache locations
URL: https://github.com/apache/incubator-druid/pull/8038#discussion_r325520829
 
 

 ##########
 File path: 
server/src/main/java/org/apache/druid/segment/loading/LeastBytesUsedStorageLocationSelectorStrategy.java
 ##########
 @@ -0,0 +1,51 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one
+ * or more contributor license agreements.  See the NOTICE file
+ * distributed with this work for additional information
+ * regarding copyright ownership.  The ASF licenses this file
+ * to you under the Apache License, Version 2.0 (the
+ * "License"); you may not use this file except in compliance
+ * with the License.  You may obtain a copy of the License at
+ *
+ *   http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing,
+ * software distributed under the License is distributed on an
+ * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+ * KIND, either express or implied.  See the License for the
+ * specific language governing permissions and limitations
+ * under the License.
+ */
+
+package org.apache.druid.segment.loading;
+
+import com.google.common.collect.ImmutableList;
+import com.google.common.collect.Ordering;
+import org.apache.druid.timeline.DataSegment;
+
+import java.util.Comparator;
+import java.util.Iterator;
+
+/**
+ * A {@link StorageLocation} selector strategy that selects a segment cache 
location that is least filled each time
+ * among the available storage locations.
+ */
+public class LeastBytesUsedStorageLocationSelectorStrategy implements 
StorageLocationSelectorStrategy
+{
+  private static final Comparator<StorageLocation> COMPARATOR = Comparator
+      .comparingLong(StorageLocation::currSizeBytes);
+
+  private ImmutableList<StorageLocation> storageLocations;
+
+  @Override
+  public Iterator<StorageLocation> getLocations(DataSegment dataSegment, 
String storageDirStr)
+  {
+    return 
Ordering.from(COMPARATOR).sortedCopy(this.storageLocations).iterator();
 
 Review comment:
   I've been thinking about how important it is to return locations in the 
correct order in the multiple threads scenario. I think it's something nice to 
have if we can fix easily, but it doesn't look mandatory. First of all, the 
worst thing could happen is, two threads can pick up the same location at the 
same time and there could be no physically available space in the location 
after one of those threads loaded a segment. If this happens, one thread would 
fail to load a segment but will pick up another location for the next segment. 
The failed segment will be loaded by another thread or historical. This sounds 
ok and I believe this worst case will not even happen often since 
`StorageLocation` is a logical abstraction of some location of the underlying 
disk and usually the physical max space is bigger than the configured max for 
`StorageLocation`. 
   
   Also, it's not easy to fix it. To fix it properly, 
`Collections.sort(locations)` and `StorageLocation.reserve()` should be called 
in an atomic operation. They should also use the same lock with 
`StorageLocation.removeFile()` and `StorageLocation.removeSegmentDir()`. This 
means, probably the current separate structure of `StorageLocation` and 
`StorageLocationSelectorStrategy` is not enough and we might need a new 
abstraction which also abstracts all kinds of concurrency issues and accesses 
to the underlying storage. But, I don't think this would be worth to do it for 
now.
   
   So, I'm ok with just returning a snapshot of the storage locations as it 
does if it's explicitly commented in the javadoc of the 
`StorageLocationSelectorStrategy` interface.
   
   @sashidhar @himanshug @dclim @nishantmonu51 what do you guys think?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to