[ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16234464#comment-16234464
 ] 

ASF GitHub Bot commented on ARROW-1559:
---------------------------------------

xhochy commented on issue #1266: WIP: ARROW-1559: Add unique kernel
URL: https://github.com/apache/arrow/pull/1266#issuecomment-341184022
 
 
   At the moment, I also tend to step a bit back and first have a look at this 
again in a design document. There are several issues where I have no clear 
opinion yet but that would probably require some thinking:
   
    * Do we need kernel call methods for each level of 
Array/ChunkedArray/Column? Having them instead of a generic `InvokeUnary` on 
each of the three data structures might lead to a lot of code duplication or 
simple pass-through functions. Otherwise having an `InvokeUnary` method would 
prohibit us from doing some optimizations in the case that we pass over several 
arrays in a column and could do some operations only once.
    * My use case here is to selective categorical conversion, my initial 
approach was to implement `unique(column)` and then use this to create a 
`DictionaryType` instance that would then be fed to all underlying arrays to 
make the categorical conversion. This might not be the best solution as the 
`DictionaryType` instance doesn't contain the hash map anymore and would have 
to reconstruct it.
   
   Also, do we in general have a design document for the kernels? We need to 
think about state, parallelisation, .. in general. I might have missed this but 
I think having it integrated into the Arrow documentation will ease entry for 
future contributors (and myself).

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-1559
>                 URL: https://issues.apache.org/jira/browse/ARROW-1559
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>            Priority: Major
>              Labels: Analytics, pull-request-available
>             Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to