[ https://issues.apache.org/jira/browse/ARROW-38?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Wes McKinney closed ARROW-38. ----------------------------- Resolution: Won't Fix This issue is ill-defined. We will have to address hashing of nested types in the course of implementing kernels like Unique, or hash joins > C++: Algorithms for using nested types in a hash table context > -------------------------------------------------------------- > > Key: ARROW-38 > URL: https://issues.apache.org/jira/browse/ARROW-38 > Project: Apache Arrow > Issue Type: New Feature > Components: C++ > Reporter: Wes McKinney > Priority: Major > > Computing hash values (and performing equality comparisons) for top-level > slots in nested-type data (for example, computing DISTINCT on a > {{List<List<Int32>>}}, related: ARROW-32) can be fairly complex. > Additionally, value slots at any level of the type tree can be null. > We should explore various algorithms for their performance and memory use in > practical settings. For example, one can compute a contiguous "record" / byte > array resulting from a depth-first traversal of a single value slot for the > purposes of computing a hash value or comparing with another slot. If anyone > has other ideas from past experiences I would be keen to learn more. -- This message was sent by Atlassian JIRA (v7.6.3#76005)