itschrispeck commented on code in PR #13199:
URL: https://github.com/apache/pinot/pull/13199#discussion_r1609536931
##########
pinot-core/src/main/java/org/apache/pinot/core/operator/filter/AndFilterOperator.java:
##########
@@ -59,13 +60,14 @@ protected BlockDocIdSet getTrues() {
protected BlockDocIdSet getFalses() {
List<BlockDocIdSet> blockDocIdSets = new
ArrayList<>(_filterOperators.size());
for (BaseFilterOperator filterOperator : _filterOperators) {
- if (filterOperator.isResultEmpty()) {
- blockDocIdSets.add(new MatchAllDocIdSet(_numDocs));
+ if (_nullHandlingEnabled) {
+ blockDocIdSets.add(
+ new OrDocIdSet(Arrays.asList(filterOperator.getTrues(),
filterOperator.getNulls()), _numDocs));
Review Comment:
I think that makes sense as an optimization, added the cases.
I don't think the second point is a regression - previously .getFalses()
created an `OrDocIdSet` for every predicate anyway, so we'd see something like
this:
```
OrDocIdSet(NotDocIdSet(OrDocIdSet(...), NotDocIdSet(OrDocIdSet(...))
```
with the change in this PR it would instead be:
```
NotDocIdSet(AndDocIdSet(OrDocIdSet(...), OrDocIdSet(...)))
```
This specific query speedup is very apparent, using a quickstart dataset but
w/ 60k rows I did a quick comparison:
```
q1: select count(*) from fineFoodReviews where NOT regexp_like("Text",
'happen to be allergic to it')
q2: select count(*) from fineFoodReviews where NOT text_match("Text",
'"happen to be allergic to it"')
q3: select count(*) from fineFoodReviews where NOT (text_match("Text",
'"happen to be allergic to it"') AND regexp_like("Text", 'happen to be allergic
to it'))
q1: 39ms
q2: 5ms
q3: 6ms
```
Without the change, q3 latency is >= q1. I'm happy to set up a microbench if
you think it'd be helpful.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]