Wes McKinney created ARROW-38:
---------------------------------
Summary: C++: Algorithms for using nested types in a hash table
context
Key: ARROW-38
URL: https://issues.apache.org/jira/browse/ARROW-38
Project: Apache Arrow
Issue Type: New Feature
Components: C++
Reporter: Wes McKinney
Computing hash values (and performing equality comparisons) for top-level slots
in nested-type data (for example, computing DISTINCT on a
{{List<List<Int32>>}}, related: ARROW-32) can be fairly complex. Additionally,
value slots at any level of the type tree can be null.
We should explore various algorithms for their performance and memory use in
practical settings. For example, one can compute a contiguous "record" / byte
array resulting from a depth-first traversal of a single value slot for the
purposes of computing a hash value or comparing with another slot. If anyone
has other ideas from past experiences I would be keen to learn more.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)