[jira] [Commented] (DRILL-6825) Applying different hash function according to data types and data size

Boaz Ben-Zvi (JIRA) Fri, 02 Nov 2018 17:42:15 -0700


    [ 
https://issues.apache.org/jira/browse/DRILL-6825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16673860#comment-16673860
 ]


Boaz Ben-Zvi commented on DRILL-6825:
-------------------------------------

We were talking a while back about changing the use of hash functions, instead 
of generating code – make a virtual call that computes the hash value for each 
type of vector (similar to the `copyEntry()` in the `ValueVector`).

And then compute the hash value by iterating over the key columns (similar to 
`appendRow()` in `VectorContainer` - though need to know which columns belong 
to the key).

Also this would remove the hash value computation from the HashTable.

Don't remember if a Jira was opened for that work. This would definitely 
simplify using different hash functions, per each datatype.

One last point - may need to keep various integers hashing compatibility - so 
best if  `HashValue(X as smallIint) == HashValue(X as int) == HashValue(X as 
bigint)`

> Applying different hash function according to data types and data size
> ----------------------------------------------------------------------
>
>                 Key: DRILL-6825
>                 URL: https://issues.apache.org/jira/browse/DRILL-6825
>             Project: Apache Drill
>          Issue Type: Improvement
>          Components: Execution - Codegen
>            Reporter: weijie.tong
>            Priority: Major
>             Fix For: 1.16.0
>
>
> Different hash functions have different performance according to different 
> data types and data size. We should choose a right one to apply not just 
> Murmurhash.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (DRILL-6825) Applying different hash function according to data types and data size

Reply via email to