[jira] [Commented] (ARROW-3978) [C++] Implement hashing, dictionary-encoding for StructArray

Antoine Pitrou (JIRA) Tue, 23 Apr 2019 08:52:13 -0700


    [ 
https://issues.apache.org/jira/browse/ARROW-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16824273#comment-16824273
 ]


Antoine Pitrou commented on ARROW-3978:
---------------------------------------

To implement this efficiently, we would need to split the computation of hash 
values (for an array or morsel) from their use in hashing kernels. It is 
probably possible to hash struct values efficiently, simply by first hashing 
the underlying child arrays, then by combining the results.

> [C++] Implement hashing, dictionary-encoding for StructArray
> ------------------------------------------------------------
>
>                 Key: ARROW-3978
>                 URL: https://issues.apache.org/jira/browse/ARROW-3978
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: C++
>            Reporter: Wes McKinney
>            Priority: Major
>             Fix For: 0.14.0
>
>
> This is a central requirement for hash-aggregations such as
> {code}
> SELECT AGG_FUNCTION(expr)
> FROM table
> GROUP BY expr1, expr2, ...
> {code}
> The materialized keys in the GROUP BY section form a struct, which can be 
> incrementally hashed to produce dictionary codes suitable for computing 
> aggregates or any other purpose. 
> There are a few subtasks related to this, such as efficiently constructing a 
> record (that can be hashed quickly) to identify each "row" in the struct. 
> Maybe we should start with that first



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (ARROW-3978) [C++] Implement hashing, dictionary-encoding for StructArray

Reply via email to