[ https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16250843#comment-16250843 ]
ASF GitHub Bot commented on ARROW-1559: --------------------------------------- wesm commented on issue #1266: WIP: ARROW-1559: Add unique kernel URL: https://github.com/apache/arrow/pull/1266#issuecomment-344145502 I started pulling on threads and the whole thing unraveled. I needed to add a variant output type (`arrow::compute::Datum`) and I wanted to change unique and dictionary-encode to both be kernels. I'm close to having things all working again, I will spend the next day or so re-writing the unit tests to work with the new code structure, and then making sure we don't have any obvious regressions. As one annoying matter, when you dictionary encode a chunked array, you may not have seen all the unique values yet, so the integer type output may change as you observe more chunks. As a result, for the time being I think it is best if we dictionary encode everything to int32 instead of using the adaptive integer builder. If we want to optimize space to make things smaller we can revisit after this patch ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Kernel implementations for "unique" (compute distinct elements of array) > ------------------------------------------------------------------------------ > > Key: ARROW-1559 > URL: https://issues.apache.org/jira/browse/ARROW-1559 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Reporter: Wes McKinney > Assignee: Uwe L. Korn > Labels: Analytics, pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)