[ 
https://issues.apache.org/jira/browse/STORM-139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14632121#comment-14632121
 ] 

ASF GitHub Bot commented on STORM-139:
--------------------------------------

GitHub user d2r reopened a pull request:

    https://github.com/apache/storm/pull/641

    [STORM-139] Correctly hash byte array tuple values

    * Unit test for correctness is included
    * The following tests showed no discernible difference in latency or 
throughput:
      * word_count, last 10min stats after being running > 10min
      * word_count modified to send byte[] words instead of Strings, same 
conditions
    
    This should handle the kinds of Objects we care about: reference types, all 
primitive array types, and reference array types.
    
    
    Using Java's .hashCode is a problem for distributed systems, as was pointed 
out in the JIRA.
    
    However, the behavior of 
[String#hashCode](https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#hashCode%28%29),
 and the implementation of 
[java.util.Arrays#hashCode](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.hashCode%28float[]%29)
 for all primitive array types is defined and should be consistent across JVMs. 
 This leaves Objects and Object[] that are not Strings.  In these cases, this 
change still trusts the hashCode method provided in those classes.
    
    In practice, bad cases should be rare.  In the cases when hashing is 
inconsistent and some sort of partitioning is used, it could be very difficult 
to debug.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/d2r/storm storm-138-byte-array-hashcode

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/641.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #641
    
----
commit b132520adb00def6d03ef020b1cb5973eb3b3519
Author: Derek Dagit <[email protected]>
Date:   2015-07-17T23:45:39Z

    Correctly hash byte array tuple values

----


> hashCode does not work for byte[]
> ---------------------------------
>
>                 Key: STORM-139
>                 URL: https://issues.apache.org/jira/browse/STORM-139
>             Project: Apache Storm
>          Issue Type: Bug
>            Reporter: James Xu
>            Priority: Minor
>
> https://github.com/nathanmarz/storm/issues/245
> Storm should use a different hashCode method when getting the hash for a 
> byte[] array, since the default one uses the object identity. Should check 
> the behavior on other arrays as well
> ----------
> xiaokang: I tested byte[] and other arrays. The hashCode of array is the 
> array object identity.
> I alse tested that java.util.Arrays.hashCode(xx[]) is based of the array 
> element's hash code. It maybe ok change the list-hash-code function of 
> tuple.clj to fix the problem.
> ----------
> Sirwellington: you may want to read this:
> http://martin.kleppmann.com/2012/06/18/java-hashcode-unsafe-for-distributed-systems.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to