drin opened a new pull request, #13583:
URL: https://github.com/apache/arrow/pull/13583

   This addresses a bug where the `count_distinct` function simply added counts 
when merging state. The correct logic would be to return the number of distinct 
elements after both states have been merged.
   
   State for count_distinct is backed by a MemoTable, which is then backed by a 
HashTable. To properly merge state, this PR adds 2 functions to each MemoTable: 
`MaybeInsert` and `MergeTable`. The MaybeInsert function handles simplified 
logic for inserting an element into the MemoTable. The MergeTable function 
handles iteration over elements in the MemoTable _to be merged_.
   
   This PR also adds an R test and a C++ test. The R test mirrors what was 
provided in ARROW-16807. The C++ test, `AllChunkedArrayTypesWithNulls`, mirrors 
another C++ test, `AllArrayTypesWithNulls`, but uses chunked arrays for test 
data.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to