jaylmiller opened a new pull request, #5554: URL: https://github.com/apache/arrow-datafusion/pull/5554
# Which issue does this PR close? Closes #258. # Rationale for this change The count distinct physical expr was doing alot of unnecessary hashing when it is ran on dictionary types. Instead, we can just keep track of seen indices (keys) with an array and no hashing is required. # What changes are included in this PR? A new accumulator (`CountDistinctDictAccumulator`) that is returned by `DistinctCount` in the case that a dictionary array is being counted. If it is not a dictionary array, just fall back to the default accumulator (`DistinctCountAccumulator`) # Are these changes tested? Added some new unit tests. # Are there any user-facing changes? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
