[
https://issues.apache.org/jira/browse/ARROW-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Wes McKinney closed ARROW-38.
-----------------------------
Resolution: Won't Fix
This issue is ill-defined. We will have to address hashing of nested types in
the course of implementing kernels like Unique, or hash joins
> C++: Algorithms for using nested types in a hash table context
> --------------------------------------------------------------
>
> Key: ARROW-38
> URL: https://issues.apache.org/jira/browse/ARROW-38
> Project: Apache Arrow
> Issue Type: New Feature
> Components: C++
> Reporter: Wes McKinney
> Priority: Major
>
> Computing hash values (and performing equality comparisons) for top-level
> slots in nested-type data (for example, computing DISTINCT on a
> {{List<List<Int32>>}}, related: ARROW-32) can be fairly complex.
> Additionally, value slots at any level of the type tree can be null.
> We should explore various algorithms for their performance and memory use in
> practical settings. For example, one can compute a contiguous "record" / byte
> array resulting from a depth-first traversal of a single value slot for the
> purposes of computing a hash value or comparing with another slot. If anyone
> has other ideas from past experiences I would be keen to learn more.
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)