itschrispeck opened a new pull request, #13199: URL: https://github.com/apache/pinot/pull/13199
Problem: https://github.com/apache/pinot/pull/11185 added proper support for null handling. One side effect is that the order of execution was changed, which has performance implications for queries where a bitmap based filter operator can reduce evaluating some expression like `regexp_like` as many times. In summary, `NOT (a AND b)` is executed as `NOT a OR NOT b`. For example, affected queries could look like: ``` NOT (text_match(col, '...') AND regexp_like(col, '...')) ``` In this case, the PR changed `NotFilterOperator` to use `AndFilterOperator.getFalses()` which [builds an `OrDocIdSet` from the false DocIdSets](https://github.com/apache/pinot/pull/11185/files#diff-13077035b35ccc6f9b73625c2315d9571a80fd9fedcec2d75f689f9b335e22aaR59-R68) instead of using a `NotDocIdIterator` built from the `AndDocIdSet` as was done in the old implementation. This PR changes implementation back to the first, except also handles nulls properly. Open question: The behavior of `NOT (a OR b)` was also changed to be executed as `NOT a AND NOT b` - I'm not sure if it's better to leave this even though the order of execution is implicitly changed, since the change probably benefits most queries. One option is the ensure the implementation matches the query and then use an optimizer if we think this case should be executed differently. tags: `bugfix` `performance` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
