[ 
https://issues.apache.org/jira/browse/ARROW-2653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16689424#comment-16689424
 ] 

Wes McKinney commented on ARROW-2653:
-------------------------------------

I agree with this, with the caveat that in my experience this type of code 
(hashing) is very performance sensitive. I've been surprised at the changes 
that can cause 30-50% change in execution time. 

Luckily, we have plenty of other hash table implementations in other projects 
to look at to see how efficient our implementations are relative to comparables

> [C++] Refactor hash table support
> ---------------------------------
>
>                 Key: ARROW-2653
>                 URL: https://issues.apache.org/jira/browse/ARROW-2653
>             Project: Apache Arrow
>          Issue Type: Task
>          Components: C++
>    Affects Versions: 0.11.1
>            Reporter: Antoine Pitrou
>            Assignee: Antoine Pitrou
>            Priority: Major
>
> Currently our hash table support is scattered in several places:
>  * {{compute/kernels/hash.cc}}
>  * {{util/hash.h}} and {{util/hash.cc}}
>  * {{builder.cc}} (in the DictionaryBuilder implementation)
> Perhaps we should have something like a type-parametered hash table class 
> (perhaps backed by non-owned memory) with several primitives:
>  * decide allocation size for a given number of items
>  * lookup an item
>  * insert an item
>  * decide whether resizing is needed
>  * resize to a new memory area
>  * ...



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to