[ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250843#comment-16250843
 ] 

ASF GitHub Bot commented on ARROW-1559:
---------------------------------------

wesm commented on issue #1266: WIP: ARROW-1559: Add unique kernel
URL: https://github.com/apache/arrow/pull/1266#issuecomment-344145502
 
 
   I started pulling on threads and the whole thing unraveled. I needed to add 
a variant output type (`arrow::compute::Datum`) and I wanted to change unique 
and dictionary-encode to both be kernels. I'm close to having things all 
working again, I will spend the next day or so re-writing the unit tests to 
work with the new code structure, and then making sure we don't have any 
obvious regressions.
   
   As one annoying matter, when you dictionary encode a chunked array, you may 
not have seen all the unique values yet, so the integer type output may change 
as you observe more chunks. As a result, for the time being I think it is best 
if we dictionary encode everything to int32 instead of using the adaptive 
integer builder. If we want to optimize space to make things smaller we can 
revisit after this patch

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-1559
>                 URL: https://issues.apache.org/jira/browse/ARROW-1559
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>              Labels: Analytics, pull-request-available
>             Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to