[ 
https://issues.apache.org/jira/browse/IGNITE-14769?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454594#comment-17454594
 ] 

Andrey Mashenkov edited comment on IGNITE-14769 at 12/7/21, 11:33 AM:
----------------------------------------------------------------------

[~korlov] PR look good.
Benchmark results of all alternative hash functions look similar. The 
distribution looks acceptable. So, we can take any of them.

Partially agree with Andrey, but 
* Any of these functions may have bugs, even a murmur3. But murmur3 is widely 
used and it is less risky.
Anyway, we can run SMHasher benches or other tests, and check implementation 
quality on different platforms for any function, if found a better one.
(10+% maybe don't worth it, but e.g. 2x-10x and more looks discussable)
* I'm not sure what 'optimization for Java Runtime' meant if the port shows 
comparable results.
I'd think about possible hardware optimizations (vectorization, AVX, SSE), and 
maybe consider using native C++ implementation.
The risk here - native call to c++  implementation may cause performance drops 
on some environments and/or hardware.

I think we should focus on refactoring and the possibility of switching to 
another function easily in the future. 
The utility-class method call with murmur3 implementation is ok.


was (Author: amashenkov):
[~korlov] PR look good.
Benchmark results of all alternative hash functions look similar. The 
distribution looks acceptable. So, we can take any of them.

Partially agree with Andrey, but 
* Any of these functions may have bugs, even a murmur3. But murmur3 is widely 
used and it is less risky.
Anyway, we can run SMHasher benches or other tests, and check implementation 
quality on different platforms for any function, if found a better one.
* I'm not sure what 'optimization for Java Runtime' meant if the port shows 
comparable results.
I'd think about possible hardware optimizations (vectorization, AVX, SSE), and 
maybe consider using native C++ implementation.
The risk here - native call to c++  implementation may cause performance drops 
on some environments and/or hardware.

I think we should focus on refactoring and the possibility of switching to 
another function easily in the future. 
The utility-class method call with murmur3 implementation is ok.

> Key hash calculation.
> ---------------------
>
>                 Key: IGNITE-14769
>                 URL: https://issues.apache.org/jira/browse/IGNITE-14769
>             Project: Ignite
>          Issue Type: Improvement
>            Reporter: Andrey Mashenkov
>            Assignee: Konstantin Orlov
>            Priority: Major
>              Labels: iep-54, ignite-3
>         Attachments: Partition count 100.png, Partition count 1024 (rnd).png, 
> Partition count 1024.png, Partition count 128 (rnd).png, Partition count 
> 128.png, Partition count 16 (rnd).png, Partition count 32.png, Partition 
> count 480 (rnd).png, Partition count 480.png, Partition count 8.png
>
>   Original Estimate: 96h
>  Remaining Estimate: 96h
>
> There are next possible ways for cache calculation.
>  # Update hash on every write method call as it works for now.
>  # Calculate for all key chunk (hash of byte[]) - all columns including a 
> null-map.
> Let's choose and implement the best way and along with a better hash function,
> e.g. xxHash64 [1], Murmur3 [2]released in Apache Commons, CityHash (from 
> Google) [3], FastHash32 [4].
>  
> [1][https://github.com/Cyan4973/xxHash/]
> [2][https://commons.apache.org/proper/commons-codec/jacoco/org.apache.commons.codec.digest/MurmurHash3.java.html]
> [3] [https://github.com/google/cityhash]
> [4] [https://github.com/rurban/smhasher/blob/master/fasthash.cpp]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

Reply via email to