[ 
https://issues.apache.org/jira/browse/AVRO-2905?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kyoungha Min updated AVRO-2905:
-------------------------------
    Description: 
Bug : utf8 hash reset is not correctly done, as a result, returns different 
hash when the values are logically equal.

 

cause commit : 

 
[https://github.com/apache/avro/commit/d5d1cd70e920f45468bf696d3668b496a9adbab3#diff-7001731cdafe4fc24f86e71c7e8a0b61]

[https://github.com/apache/avro/commit/8e345c48e8f9708c27af63754965777314dac85c#diff-7001731cdafe4fc24f86e71c7e8a0b61]

 

reproduce : 
{code:java}
 org.apache.avro.util.Utf8 utf8 = new org.apache.avro.util.Utf8();
 utf8.set("hello");
 System.out.println(utf8.hashCode()); // 99162322
 utf8.set("hello");
 System.out.println(utf8.hashCode()); // -1538739392
{code}
 

Suggested fix.
 # get rid of `hasHash` field. In my opinion, it leads to more bug-prone by 
adding additional variable to take care of. `hash != 0` should be enough to 
check the presence of hash. Redundant computation for `0 hash` elements should 
be negligible.
 # reset `hash = 0` on every mutation call.
 # not needed, but I want to see some minor optimization like avoiding memory 
access by creating local variable (int hash = 0; .... this.hash = hash).

 

  was:
Bug : utf8 hash reset is not correctly done, as a result, returns different 
hash when the values are logically equal.

 

cause commit : 

 
[https://github.com/apache/avro/commit/d5d1cd70e920f45468bf696d3668b496a9adbab3#diff-7001731cdafe4fc24f86e71c7e8a0b61]

[https://github.com/apache/avro/commit/8e345c48e8f9708c27af63754965777314dac85c#diff-7001731cdafe4fc24f86e71c7e8a0b61]

 

reproduce : 

```

org.apache.avro.util.Utf8 utf8 = new org.apache.avro.util.Utf8();
utf8.set("hello");
System.out.println(utf8.hashCode()); // 99162322
utf8.set("hello");
System.out.println(utf8.hashCode()); // -1538739392

```

 

Suggested fix.
 # get rid of `hasHash` field. In my opinion, it leads to more bug-prone by 
adding additional variable to take care of. `hash != 0` should be enough to 
check the presence of hash. Redundant computation for `0 hash` elements should 
be negligible.
 # reset `hash = 0` on every mutation call.
 # I want to see some minor optimization like avoiding memory access by 
creating local variable (int hash = 0; .... this.hash = hash).

 


> org.apache.avro.util.Utf8 bug in `hashCode`
> -------------------------------------------
>
>                 Key: AVRO-2905
>                 URL: https://issues.apache.org/jira/browse/AVRO-2905
>             Project: Apache Avro
>          Issue Type: Bug
>          Components: java
>    Affects Versions: 1.10.0
>            Reporter: Kyoungha Min
>            Priority: Major
>
> Bug : utf8 hash reset is not correctly done, as a result, returns different 
> hash when the values are logically equal.
>  
> cause commit : 
>  
> [https://github.com/apache/avro/commit/d5d1cd70e920f45468bf696d3668b496a9adbab3#diff-7001731cdafe4fc24f86e71c7e8a0b61]
> [https://github.com/apache/avro/commit/8e345c48e8f9708c27af63754965777314dac85c#diff-7001731cdafe4fc24f86e71c7e8a0b61]
>  
> reproduce : 
> {code:java}
>  org.apache.avro.util.Utf8 utf8 = new org.apache.avro.util.Utf8();
>  utf8.set("hello");
>  System.out.println(utf8.hashCode()); // 99162322
>  utf8.set("hello");
>  System.out.println(utf8.hashCode()); // -1538739392
> {code}
>  
> Suggested fix.
>  # get rid of `hasHash` field. In my opinion, it leads to more bug-prone by 
> adding additional variable to take care of. `hash != 0` should be enough to 
> check the presence of hash. Redundant computation for `0 hash` elements 
> should be negligible.
>  # reset `hash = 0` on every mutation call.
>  # not needed, but I want to see some minor optimization like avoiding memory 
> access by creating local variable (int hash = 0; .... this.hash = hash).
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to