[GitHub] [pinot] gortiz commented on a diff in pull request #8766: Optimize ColumnValueSegmentPruner by caching value hashes

GitBox Thu, 26 May 2022 00:49:24 -0700


gortiz commented on code in PR #8766:
URL: https://github.com/apache/pinot/pull/8766#discussion_r882409290



##########
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java:
##########
@@ -91,23 +95,28 @@ public List<IndexSegment> prune(List<IndexSegment> 
segments, QueryContext query)
     // Extract EQ/IN/RANGE predicate columns
     Set<String> eqInColumns = new HashSet<>();
     Set<String> rangeColumns = new HashSet<>();
-    extractPredicateColumns(filter, eqInColumns, rangeColumns);
+    // As Predicates are recursive structures, their hashCode is quite 
expensive.
+    // By using an IdentityHashMap here we don't need to iterate over the 
recursive
+    // structure. This is specially useful in the IN expression.
+    Map<Predicate, Object> cachedValues = new IdentityHashMap<>();
+    extractPredicateColumns(filter, eqInColumns, rangeColumns, cachedValues);
 
     if (eqInColumns.isEmpty() && rangeColumns.isEmpty()) {
       return segments;
     }
 
     int numSegments = segments.size();
     List<IndexSegment> selectedSegments = new ArrayList<>(numSegments);
+
     if (!eqInColumns.isEmpty() && query.isEnablePrefetch()) {
       Map[] dataSourceCaches = new Map[numSegments];
       FetchContext[] fetchContexts = new FetchContext[numSegments];
       try {
         // Prefetch bloom filter for columns within the EQ/IN predicate if 
exists
         for (int i = 0; i < numSegments; i++) {
           IndexSegment segment = segments.get(i);
-          Map<String, DataSource> dataSourceCache = new HashMap<>();
-          Map<String, List<ColumnIndexType>> columnToIndexList = new 
HashMap<>();
+          Map<String, DataSource> dataSourceCache = new 
HashMap<>(eqInColumns.size());

Review Comment:
   The reason I added the size is because I saw some resizes in the flamegraph. 
Anyway, the new implementation doesn't use `dataSourceCache` when 
immutableSegments are used, so it isn't going to be as remarkable as before.
   
   IMHO if the map is very small doesn't really matter if there are too many 
collisions, as a linear probe will be fast. In fact, if the map is very small, 
probably an ArrayList that filters by hashcode before equals would be than a 
map.
   
   



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [pinot] gortiz commented on a diff in pull request #8766: Optimize ColumnValueSegmentPruner by caching value hashes

Reply via email to