[ 
https://issues.apache.org/jira/browse/SPARK-50842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yibo Cai updated SPARK-50842:
-----------------------------
    Summary: Replace Murmur3_x86_32 with Murmur3_x64_32  (was: Replace 
Murmur3_x86_32 with Murmur_x64_32)

> Replace Murmur3_x86_32 with Murmur3_x64_32
> ------------------------------------------
>
>                 Key: SPARK-50842
>                 URL: https://issues.apache.org/jira/browse/SPARK-50842
>             Project: Spark
>          Issue Type: Improvement
>          Components: Spark Core
>    Affects Versions: 3.5.5
>            Reporter: Yibo Cai
>            Priority: Major
>
> MurmurHash3 has two variants:
> - x86 version generates 32 bits hash value, it processes 4 bytes in each 
> iteration.
> - x64 version generates 128 bits hash value, processes 16 byes in each 
> iteration.
> Spark uses 
> [Murmur3_x86_32|https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java].
>  MurmurHash3 x64 runs much faster than x32 version on 64 bit platform. We can 
> simply truncate the 128 bits hash value to 32 bits (or xor the 4 words) to be 
> compatible with current code, without losing hashing effectiveness.
> Observed small yet stable performance improvement on some TPC-DS benchmarks 
> if replace x86 Murmur hash with x64 version.
> Is it okay to replace x86 Murmur hash with x64 version? Any possible problem 
> of this change? e.g., compatibility.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to