Gopal V created HIVE-21531:
------------------------------
Summary: Vectorization: all NULL hashcodes are not computed using
Murmur3
Key: HIVE-21531
URL: https://issues.apache.org/jira/browse/HIVE-21531
Project: Hive
Issue Type: Bug
Reporter: Gopal V
The comments in Vectorized hash computation call out the MurmurHash
implementation (the one using 0x5bd1e995), while the non-vectorized codepath
calls out the Murmur3 one (using 0xcc9e2d51).
The comments here are wrong
{code}
/**
* Batch compute the hash codes for all the serialized keys.
*
* NOTE: MAJOR MAJOR ASSUMPTION:
* We assume that HashCodeUtil.murmurHash produces the same result
* as MurmurHash.hash with seed = 0 (the method used by
ReduceSinkOperator for
* UNIFORM distribution).
*/
protected void computeSerializedHashCodes() {
int offset = 0;
int keyLength;
byte[] bytes = output.getData();
for (int i = 0; i < nonNullKeyCount; i++) {
keyLength = serializedKeyLengths[i];
hashCodes[i] = Murmur3.hash32(bytes, offset, keyLength, 0);
offset += keyLength;
}
}
{code}
but the wrong comment is followed in the Vector RS operator
{code}
System.arraycopy(nullKeyOutput.getData(), 0, nullBytes, 0,
nullBytesLength);
nullKeyHashCode = HashCodeUtil.calculateBytesHashCode(nullBytes, 0,
nullBytesLength);
{code}
--
This message was sent by Atlassian JIRA
(v7.6.3#76005)