[ 
https://issues.apache.org/jira/browse/ARROW-6024?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Micah Kornfield updated ARROW-6024:
-----------------------------------
    Description: 
Provide more hash algorithms to choose for different scenarios. In particular, 
we provide the following hash algorithms:
 * Simple hasher: A hasher that calculates the hash code of integers as is, and 
do not perform any finalization. So the computation is extremely efficient, but 
the quality of the produced hash code may not be good.

 * Murmur finalizing hasher: Finalize the hash code by the Murmur hashing 
algorithm. Details of the algorithm can be found in 
[https://en.wikipedia.org/wiki/MurmurHash]. Murmur hashing is computational 
expensive, as it involves several integer multiplications. However, the 
produced hash codes have good quality in the sense that they are uniformly 
distributed in the universe.

  was:
Provide more hash algorithms to choose for different scenarios. In particular, 
we provide the following hash algorithms:

* Simple hasher: A hasher that calculates the hash code of integers as is, and 
do not perform any finalization. So the computation is extremely efficient, but 
the quality of the produced hash code may not be good.

* Murmur finalizing hasher: Finalize the hash code by the Murmur hashing 
algorithm. Details of the algorithm can be found in 
https://en.wikipedia.org/wiki/MurmurHash. Murmur hashing is computational 
expensive, as it involves several integer multiplications. However, the 
produced hash codes have good quality in the sense that they are uniformly 
distributed in the universe.

* Jenkins finalizing hasher: Finalize the hash code by Bob Jenkins' algorithm. 
Details of this algorithm can be found in 
http://www.burtleburtle.net/bob/hash/integer.html. Jenkins hashing is less 
computational expensive than Murmur hashing, as it involves no integer 
multiplication. However, the produced hash codes also have good quality in the 
sense that they are uniformly distributed in the universe.

* Non-negative hasher: Wrapper for another hasher, to make the generated hash 
code non-negative. This can be useful for scenarios like hash table.


> [Java] Provide more hash algorithms 
> ------------------------------------
>
>                 Key: ARROW-6024
>                 URL: https://issues.apache.org/jira/browse/ARROW-6024
>             Project: Apache Arrow
>          Issue Type: New Feature
>          Components: Java
>            Reporter: Liya Fan
>            Assignee: Liya Fan
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 0.15.0
>
>          Time Spent: 8h 20m
>  Remaining Estimate: 0h
>
> Provide more hash algorithms to choose for different scenarios. In 
> particular, we provide the following hash algorithms:
>  * Simple hasher: A hasher that calculates the hash code of integers as is, 
> and do not perform any finalization. So the computation is extremely 
> efficient, but the quality of the produced hash code may not be good.
>  * Murmur finalizing hasher: Finalize the hash code by the Murmur hashing 
> algorithm. Details of the algorithm can be found in 
> [https://en.wikipedia.org/wiki/MurmurHash]. Murmur hashing is computational 
> expensive, as it involves several integer multiplications. However, the 
> produced hash codes have good quality in the sense that they are uniformly 
> distributed in the universe.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

Reply via email to