[
https://issues.apache.org/jira/browse/SPARK-50842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yibo Cai updated SPARK-50842:
-----------------------------
Summary: Replace Murmur3_x86_32 with Murmur3_x64_32 (was: Replace
Murmur3_x86_32 with Murmur_x64_32)
> Replace Murmur3_x86_32 with Murmur3_x64_32
> ------------------------------------------
>
> Key: SPARK-50842
> URL: https://issues.apache.org/jira/browse/SPARK-50842
> Project: Spark
> Issue Type: Improvement
> Components: Spark Core
> Affects Versions: 3.5.5
> Reporter: Yibo Cai
> Priority: Major
>
> MurmurHash3 has two variants:
> - x86 version generates 32 bits hash value, it processes 4 bytes in each
> iteration.
> - x64 version generates 128 bits hash value, processes 16 byes in each
> iteration.
> Spark uses
> [Murmur3_x86_32|https://github.com/apache/spark/blob/master/common/unsafe/src/main/java/org/apache/spark/unsafe/hash/Murmur3_x86_32.java].
> MurmurHash3 x64 runs much faster than x32 version on 64 bit platform. We can
> simply truncate the 128 bits hash value to 32 bits (or xor the 4 words) to be
> compatible with current code, without losing hashing effectiveness.
> Observed small yet stable performance improvement on some TPC-DS benchmarks
> if replace x86 Murmur hash with x64 version.
> Is it okay to replace x86 Murmur hash with x64 version? Any possible problem
> of this change? e.g., compatibility.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]