[GitHub] [pinot] dongxiaoman commented on a change in pull request #7243: Broker Performance Improvement when there are 10k+ segments

GitBox Wed, 11 Aug 2021 10:55:25 -0700


dongxiaoman commented on a change in pull request #7243:
URL: https://github.com/apache/pinot/pull/7243#discussion_r686940891




##########
File path: 
pinot-broker/src/main/java/org/apache/pinot/broker/routing/segmentpruner/PartitionSegmentPruner.java
##########
@@ -163,48 +181,44 @@ public synchronized void refreshSegment(String segment) {
       if (filterExpression == null) {
         return segments;
       }
-      Set<String> selectedSegments = new HashSet<>();
-      for (String segment : segments) {
-        PartitionInfo partitionInfo = _partitionInfoMap.get(segment);
-        if (partitionInfo == null || partitionInfo == INVALID_PARTITION_INFO 
|| isPartitionMatch(filterExpression,
-            partitionInfo)) {
-          selectedSegments.add(segment);
-        }
-      }
-      return selectedSegments;
+      return pruneSegments((partitionInfo, cachedPartitionFunction) -> 
isPartitionMatch(filterExpression,
+        partitionInfo, cachedPartitionFunction));
     } else {
       // PQL
       FilterQueryTree filterQueryTree = 
RequestUtils.generateFilterQueryTree(brokerRequest);
       if (filterQueryTree == null) {
         return segments;
       }
-      Set<String> selectedSegments = new HashSet<>();
-      for (String segment : segments) {
-        PartitionInfo partitionInfo = _partitionInfoMap.get(segment);
-        if (partitionInfo == null || partitionInfo == INVALID_PARTITION_INFO 
|| isPartitionMatch(filterQueryTree,
-            partitionInfo)) {
-          selectedSegments.add(segment);
-        }
+      return pruneSegments((partitionInfo, cachedPartitionFunction) -> 
isPartitionMatch(filterQueryTree, partitionInfo, cachedPartitionFunction));
+    }
+  }
+
+  private Set<String> 
pruneSegments(java.util.function.BiFunction<PartitionInfo, 
CachedPartitionFunction,  Boolean> partitionMatchLambda) {
+    Set<String> selectedSegments = new HashSet<>();
+    CachedPartitionFunction cachedPartitionFunction = new 
CachedPartitionFunction();

Review comment:
       The Cache/LookupTable here is not for our Segment names. It is for the 
Value -> PartitionNo in each SQL statement. E.g., if we have something like 
`WHERE account_key = 'a7cd24e'` and we need to compute 
`MurmurHash.getPartition('a7cd24e')` repeatedly for each segment EQUAL 
comparison. If we have 10k segments, we may end up computing  
`MurmurHash.getPartition('a7cd24e')` 10k times. 
   Because any value can be passed in SQL statement, it makes sense to create 
the cache for each SQL




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] dongxiaoman commented on a change in pull request #7243: Broker Performance Improvement when there are 10k+ segments

Reply via email to