suneet-s opened a new pull request #10312: URL: https://github.com/apache/druid/pull/10312
### Description For large InDimFilters, in default mode, the filter does a linear check of the set to see if it contains either an empty or null. If it does, the empties are converted to nulls by passing through the entire list again. Instead of this, in default mode, we attempt to remove an empty string from the values that are passed to the InDimFilter. If an empty string was removed, we add null to the set <img width="1659" alt="Screen Shot 2020-08-23 at 10 12 30 PM" src="https://user-images.githubusercontent.com/44787917/91006291-d383c880-e58d-11ea-951b-38ec1bc92255.png"> This flame graph shows that ~18% of query time was just spent checking if a null or empty string exists in the list of values to the InDimFilter. This happened on a join query where a filter was pushed down to the base table. The limit for filter push down was increased to a very large number so that a very large InDimFilter could be generated. <hr> This PR has: - [ ] been self-reviewed. - [ ] using the [concurrency checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md) (Remove this item if the PR doesn't have any relation to concurrency.) - [ ] added documentation for new or modified features or behaviors. - [ ] added Javadocs for most classes and all non-trivial methods. Linked related entities via Javadoc links. - [ ] added or updated version, license, or notice information in [licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml) - [ ] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
