gortiz commented on code in PR #8766:
URL: https://github.com/apache/pinot/pull/8766#discussion_r882436535


##########
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java:
##########
@@ -190,6 +201,33 @@ private void extractPredicateColumns(FilterContext filter, 
Set<String> eqInColum
         String column = lhs.getIdentifier();
 
         Predicate.Type predicateType = predicate.getType();
+        switch (predicateType) {
+          case EQ: {
+            eqInColumns.add(column);
+            cachedValues.put(predicate, new CachedValue(((EqPredicate) 
predicate).getValue()));

Review Comment:
   Nice catch! I was so focused on the IN case that didn't see that problem int 
he EQ case. 
   
   This can be a big problem. As we need to use the hash of the converted value 
and the converted value depends on each column, we will need to either pick one 
type as the _main_ one and then not prune segments whose column type doesn't 
match with it or we would need to have a map from datatype to Pair<Long, Long> 
(or equivalent) to do not evaluate the hash every time. Both solutions are bad 
in terms of performance.
   
   The first option seems cheap to implement (we don't need to lookup for the 
data type every time) and we always think that there should be a very small 
proportion of segments with a different column type, but given that we don't 
know the actual type of the column in the schema, we need to either pick the 
_main_ type with some heuristic (randomly, the first we find, etc) or iterate 
over all the segments to look for the most used datatype for each column, which 
may be expensive.
   
   I'm going to think about that and try some implementations. Also, I'm going 
to change `ColumnValueSegmentPrunerTest` to use doubles instead if ints. This 
should automatically tests errors like this without the invaluable review of 
Jackie.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to