gortiz commented on code in PR #8766:
URL: https://github.com/apache/pinot/pull/8766#discussion_r882436535
##########
pinot-core/src/main/java/org/apache/pinot/core/query/pruner/ColumnValueSegmentPruner.java:
##########
@@ -190,6 +201,33 @@ private void extractPredicateColumns(FilterContext filter,
Set<String> eqInColum
String column = lhs.getIdentifier();
Predicate.Type predicateType = predicate.getType();
+ switch (predicateType) {
+ case EQ: {
+ eqInColumns.add(column);
+ cachedValues.put(predicate, new CachedValue(((EqPredicate)
predicate).getValue()));
Review Comment:
Nice catch! I was so focused on the IN case that didn't see that problem int
he EQ case.
This can be a big problem. As we need to use the hash of the converted value
and the converted value depends on each column, we will need to either pick one
type as the _main_ one and then not prune segments whose column type doesn't
match with it or we would need to have a map from datatype to Pair<Long, Long>
(or equivalent) to do not evaluate the hash every time. Both solutions are bad
in terms of performance.
The first option seems cheap to implement (we don't need to lookup for the
data type every time) and we always think that there should be a very small
proportion of segments with a different column type, but given that we don't
know the actual type of the column in the schema, we need to either pick the
_main_ type with some heuristic (randomly, the first we find, etc) or iterate
over all the segments to look for the most used datatype for each column, which
may be expensive.
I'm going to think about that and try some implementations. Also, I'm going
to change `ColumnValueSegmentPrunerTest` to use doubles instead if ints. This
should automatically tests errors like this without the invaluable review of
Jackie.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]