suneet-s opened a new pull request #10312:
URL: https://github.com/apache/druid/pull/10312


   ### Description
   
   For large InDimFilters, in default mode, the filter does a linear check of 
the
   set to see if it contains either an empty or null. If it does, the empties 
are
   converted to nulls by passing through the entire list again.
   
   Instead of this, in default mode, we attempt to remove an empty string from 
the
   values that are passed to the InDimFilter. If an empty string was removed, we
   add null to the set
   
   <img width="1659" alt="Screen Shot 2020-08-23 at 10 12 30 PM" 
src="https://user-images.githubusercontent.com/44787917/91006291-d383c880-e58d-11ea-951b-38ec1bc92255.png";>
   
   This flame graph shows that ~18% of query time was just spent checking if a 
null or empty string exists in the list of values to the InDimFilter. This 
happened on a join query where a filter was pushed down to the base table. The 
limit for filter push down was increased to a very large number so that a very 
large InDimFilter could be generated.
   
   <hr>
   
   This PR has:
   - [ ] been self-reviewed.
      - [ ] using the [concurrency 
checklist](https://github.com/apache/druid/blob/master/dev/code-review/concurrency.md)
 (Remove this item if the PR doesn't have any relation to concurrency.)
   - [ ] added documentation for new or modified features or behaviors.
   - [ ] added Javadocs for most classes and all non-trivial methods. Linked 
related entities via Javadoc links.
   - [ ] added or updated version, license, or notice information in 
[licenses.yaml](https://github.com/apache/druid/blob/master/licenses.yaml)
   - [ ] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to