clintropolis opened a new pull request #11186: URL: https://github.com/apache/druid/pull/11186
### Description This PR adjusts the heap based topN algorithm to only use known the "known" cardinality path only when the dictionary contains unique values. The known cardinality path to aggregate values uses an array based approach, where an array of aggregator arrays the size of the value cardinality is created, and the dictionaryId is expected to index to an array position with the aggregators for that value, as an optimization to avoid a map lookup. However, when a selector is aggregated which does not have unique dictionaryIds, but does know its cardinality, such as a selector from an `IndexedTable` from a join result which uses the row number as the dictionaryId instead, it means that each dictionaryId will be 'new', and thus have a null array entry and still incur the map interaction this path is trying to avoid. Instead, these selectors will now just use the map directly by using the cardinality "unknown" path instead. This PR has: - [x] been self-reviewed. - [x] added comments explaining the "why" and the intent of the code wherever would not be obvious for an unfamiliar reader. - [ ] added unit tests or modified existing tests to cover new code paths, ensuring the threshold for [code coverage](https://github.com/apache/druid/blob/master/dev/code-review/code-coverage.md) is met. - [ ] added integration tests. - [ ] been tested in a test Druid cluster. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
