drin opened a new pull request, #13583: URL: https://github.com/apache/arrow/pull/13583
This addresses a bug where the `count_distinct` function simply added counts when merging state. The correct logic would be to return the number of distinct elements after both states have been merged. State for count_distinct is backed by a MemoTable, which is then backed by a HashTable. To properly merge state, this PR adds 2 functions to each MemoTable: `MaybeInsert` and `MergeTable`. The MaybeInsert function handles simplified logic for inserting an element into the MemoTable. The MergeTable function handles iteration over elements in the MemoTable _to be merged_. This PR also adds an R test and a C++ test. The R test mirrors what was provided in ARROW-16807. The C++ test, `AllChunkedArrayTypesWithNulls`, mirrors another C++ test, `AllArrayTypesWithNulls`, but uses chunked arrays for test data. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
