clintropolis opened a new pull request #11186:
URL: https://github.com/apache/druid/pull/11186


   ### Description
   This PR adjusts the heap based topN algorithm to only use known the "known" 
cardinality path only when the dictionary contains unique values. The known 
cardinality path to aggregate values uses an array based approach, where an 
array of aggregator arrays the size of the value cardinality is created, and 
the dictionaryId is expected to index to an array position with the aggregators 
for that value, as an optimization to avoid a map lookup. 
   
   However, when a selector is aggregated which does not have unique 
dictionaryIds, but does know its cardinality, such as a selector from an 
`IndexedTable` from a join result which uses the row number as the dictionaryId 
instead, it means that each dictionaryId will be 'new', and thus have a null 
array entry and still incur the map interaction this path is trying to avoid.
   
   Instead, these selectors will now just use the map directly by using the 
cardinality "unknown" path instead.
   
   
   This PR has:
   - [x] been self-reviewed.
   - [x] added comments explaining the "why" and the intent of the code 
wherever would not be obvious for an unfamiliar reader.
   - [ ] added unit tests or modified existing tests to cover new code paths, 
ensuring the threshold for [code 
coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md)
 is met.
   - [ ] added integration tests.
   - [ ] been tested in a test Druid cluster.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to