[GitHub] [druid] jihoonson commented on a change in pull request #10125: Fix CachingClusteredClient when querying specific segments

GitBox Thu, 02 Jul 2020 13:37:44 -0700


jihoonson commented on a change in pull request #10125:
URL: https://github.com/apache/druid/pull/10125#discussion_r449254333




##########
File path: 
server/src/main/java/org/apache/druid/client/CachingClusteredClient.java
##########
@@ -401,11 +407,16 @@ private ClusterQueryResult(Sequence<T> sequence, int 
numQueryServers)
       }
     }
 
-    private Set<SegmentServerSelector> 
computeSegmentsToQuery(TimelineLookup<String, ServerSelector> timeline)
+    private Set<SegmentServerSelector> computeSegmentsToQuery(
+        TimelineLookup<String, ServerSelector> timeline,
+        boolean specificSegments
+    )
     {
+      final java.util.function.Function<Interval, 
List<TimelineObjectHolder<String, ServerSelector>>> lookupFn
+          = specificSegments ? timeline::lookupWithIncompletePartitions : 
timeline::lookup;
       final List<TimelineObjectHolder<String, ServerSelector>> serversLookup = 
toolChest.filterSegments(
           query,
-          intervals.stream().flatMap(i -> 
timeline.lookup(i).stream()).collect(Collectors.toList())
+          intervals.stream().flatMap(i -> 
lookupFn.apply(i).stream()).collect(Collectors.toList())

Review comment:
       > Hmm, although, `getQueryRunnerForSegments` does create a modified 
timeline that only has the relevant segments, so this might be okay as long as 
we can assume none of them overshadow the others. Which should be a fair 
assumption for a query that originally started out as being by intervals. It 
seems a bit sketchy, though, and would be nice if we could avoid the 
unnecessary creation of this timeline.
   
   I agree. That would be cleaner and even more efficient since we can avoid 
unnecessary timeline build and lookup on it. However, since we are about to 
release our next version, I would like to avoid such big refactoring for now 
though. It will be more risky and potentially delay the release.
   
   > But if we stick with this approach — will this properly retain information 
about the relevant SegmentDescriptors' intervals, though? These could be 
narrower than the segments' full intervals.
   
   It does truncate the interval, but the query server can handle it. The query 
runner for specific segments in `ServerManager` or `AppenderatorImpl` uses 
`TimelineLookup.findEntry()` to find the `PartitionHolder` containing the 
segments to query. In this `findEntry()`, an entry matches if its interval 
contains the given interval.
   
   ```java
     public @Nullable PartitionHolder<ObjectType> findEntry(Interval interval, 
VersionType version)
     {
       lock.readLock().lock();
       try {
         for (Entry<Interval, TreeMap<VersionType, TimelineEntry>> entry : 
allTimelineEntries.entrySet()) {
           if (entry.getKey().equals(interval) || 
entry.getKey().contains(interval)) {
             TimelineEntry foundEntry = entry.getValue().get(version);
             if (foundEntry != null) {
               return foundEntry.getPartitionHolder().asImmutable();
             }
           }
         }
   
         return null;
       }
       finally {
         lock.readLock().unlock();
       }
     }
   ```




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [druid] jihoonson commented on a change in pull request #10125: Fix CachingClusteredClient when querying specific segments

Reply via email to