[ 
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252896#comment-16252896
 ] 

ASF GitHub Bot commented on ARROW-1559:
---------------------------------------

wesm commented on issue #1266: ARROW-1559: [C++] Add Unique kernel and refactor 
DictionaryBuilder to be a stateful kernel
URL: https://github.com/apache/arrow/pull/1266#issuecomment-344476480
 
 
   OK, I'm finally ready for some code review on this.
   
   So we have one matter to resolve that is a bit annoying. I created a 
`arrow::compute::Datum` type that is based on `boost::variant`. Fundamentally, 
we need such a variant type in our compute kernels to be able to operate on 
different types of data: scalars, arrays, chunked arrays, etc. As an example, 
we would like to be able to add two `Datum` values together, each of which 
might be a scalar value (object model TBD) or an array.
   
   I suspect that we do not want to have boost in our public headers. So we 
have a decision to make here:
   
   * We could vendor an ASL 2.0-compatible header-only `boost::variant` 
replacement like https://github.com/mapbox/variant. We're lucky that one exists
   * We can refactor `Datum` to use a PIMPL. That seems pretty heavy-weight for 
such a tiny struct -- in such case even a stack allocation of `Datum` would 
require a heap allocation, which is not free when you have a lot of these 
things.
   
   I'm inclined to take the former approach, mapbox/variant is ~1200 lines of 
headers and seems to want to be a drop-in replacement for `boost::variant`, is 
BSD-3 licensed, so probably not the worst dependency to take on. 
   
   cc @xhochy @cpcloud 

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> ------------------------------------------------------------------------------
>
>                 Key: ARROW-1559
>                 URL: https://issues.apache.org/jira/browse/ARROW-1559
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Assignee: Uwe L. Korn
>              Labels: Analytics, pull-request-available
>             Fix For: 0.8.0
>
>




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to