[ https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252896#comment-16252896 ]
ASF GitHub Bot commented on ARROW-1559: --------------------------------------- wesm commented on issue #1266: ARROW-1559: [C++] Add Unique kernel and refactor DictionaryBuilder to be a stateful kernel URL: https://github.com/apache/arrow/pull/1266#issuecomment-344476480 OK, I'm finally ready for some code review on this. So we have one matter to resolve that is a bit annoying. I created a `arrow::compute::Datum` type that is based on `boost::variant`. Fundamentally, we need such a variant type in our compute kernels to be able to operate on different types of data: scalars, arrays, chunked arrays, etc. As an example, we would like to be able to add two `Datum` values together, each of which might be a scalar value (object model TBD) or an array. I suspect that we do not want to have boost in our public headers. So we have a decision to make here: * We could vendor an ASL 2.0-compatible header-only `boost::variant` replacement like https://github.com/mapbox/variant. We're lucky that one exists * We can refactor `Datum` to use a PIMPL. That seems pretty heavy-weight for such a tiny struct -- in such case even a stack allocation of `Datum` would require a heap allocation, which is not free when you have a lot of these things. I'm inclined to take the former approach, mapbox/variant is ~1200 lines of headers and seems to want to be a drop-in replacement for `boost::variant`, is BSD-3 licensed, so probably not the worst dependency to take on. cc @xhochy @cpcloud ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > [C++] Kernel implementations for "unique" (compute distinct elements of array) > ------------------------------------------------------------------------------ > > Key: ARROW-1559 > URL: https://issues.apache.org/jira/browse/ARROW-1559 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Reporter: Wes McKinney > Assignee: Uwe L. Korn > Labels: Analytics, pull-request-available > Fix For: 0.8.0 > > -- This message was sent by Atlassian JIRA (v6.4.14#64029)