[ 
https://issues.apache.org/jira/browse/HIVE-21531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gopal V updated HIVE-21531:
---------------------------
    Affects Version/s: 3.1.1

> Vectorization: all NULL hashcodes are not computed using Murmur3
> ----------------------------------------------------------------
>
>                 Key: HIVE-21531
>                 URL: https://issues.apache.org/jira/browse/HIVE-21531
>             Project: Hive
>          Issue Type: Bug
>    Affects Versions: 3.1.1
>            Reporter: Gopal V
>            Assignee: Gopal V
>            Priority: Critical
>         Attachments: HIVE-21531.WIP.patch
>
>
> The comments in Vectorized hash computation call out the MurmurHash 
> implementation (the one using 0x5bd1e995), while the non-vectorized codepath 
> calls out the Murmur3 one (using 0xcc9e2d51).
> The comments here are wrong
> {code}
>  /**
>    * Batch compute the hash codes for all the serialized keys.
>    *
>    * NOTE: MAJOR MAJOR ASSUMPTION:
>    *     We assume that HashCodeUtil.murmurHash produces the same result
>    *     as MurmurHash.hash with seed = 0 (the method used by 
> ReduceSinkOperator for
>    *     UNIFORM distribution).
>    */
>   protected void computeSerializedHashCodes() {
>     int offset = 0;
>     int keyLength;
>     byte[] bytes = output.getData();
>     for (int i = 0; i < nonNullKeyCount; i++) {
>       keyLength = serializedKeyLengths[i];
>       hashCodes[i] = Murmur3.hash32(bytes, offset, keyLength, 0);
>       offset += keyLength;
>     }
>   }
> {code}
> but the wrong comment is followed in the Vector RS operator 
> {code}
>       System.arraycopy(nullKeyOutput.getData(), 0, nullBytes, 0, 
> nullBytesLength);
>       nullKeyHashCode = HashCodeUtil.calculateBytesHashCode(nullBytes, 0, 
> nullBytesLength);
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to