[
https://issues.apache.org/jira/browse/ARROW-1559?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16252895#comment-16252895
]
ASF GitHub Bot commented on ARROW-1559:
---------------------------------------
wesm commented on issue #1266: ARROW-1559: [C++] Add Unique kernel and refactor
DictionaryBuilder to be a stateful kernel
URL: https://github.com/apache/arrow/pull/1266#issuecomment-344476480
OK, I'm finally ready for some code review on this.
So we have one matter to resolve that is a bit annoying. I created a
`arrow::compute::Datum` type that is based on `boost::variant`. Fundamentally,
we need such a variant type in our compute kernels to be able to operate on
different types of data: scalars, arrays, chunked arrays, etc.
I suspect that we do not want to have boost in our public headers. So we
have a decision to make here:
* We could vendor an ASL 2.0-compatible header-only `boost::variant`
replacement like https://github.com/mapbox/variant. We're lucky that one exists
* We can refactor `Datum` to use a PIMPL. That seems pretty heavy-weight for
such a tiny struct -- in such case even a stack allocation of `Datum` would
require a heap allocation, which is not free when you have a lot of these
things.
I'm inclined to take the former approach, mapbox/variant is ~1200 lines of
headers and seems to want to be a drop-in replacement for `boost::variant`, is
BSD-3 licensed, so probably not the worst dependency to take on.
cc @xhochy @cpcloud
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
> [C++] Kernel implementations for "unique" (compute distinct elements of array)
> ------------------------------------------------------------------------------
>
> Key: ARROW-1559
> URL: https://issues.apache.org/jira/browse/ARROW-1559
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Assignee: Uwe L. Korn
> Labels: Analytics, pull-request-available
> Fix For: 0.8.0
>
>
--
This message was sent by Atlassian JIRA
(v6.4.14#64029)