GitHub user d2r opened a pull request:
https://github.com/apache/storm/pull/641
[STORM-139] Correctly hash byte array tuple values
* Unit test for correctness is included
* The following tests showed no discernible difference in latency or
throughput:
* word_count, last 10min stats after being running > 10min
* word_count modified to send byte[] words instead of Strings, same
conditions
This should handle the kinds of Objects we care about: reference types, all
primitive array types, and reference array types.
Using Java's .hashCode is a problem for distributed systems, as was pointed
out in the JIRA.
However, the behavior of
[String#hashCode](https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#hashCode%28%29),
and the implementation of
[java.util.Arrays#hashCode](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.hashCode%28float[]%29)
for all primitive array types is defined and should be consistent across JVMs.
This leaves Objects and Object[] that are not Strings. In these cases, this
change still trusts the hashCode method provided in those classes.
In practice, bad cases should be rare. In the cases when hashing is
inconsistent and some sort of partitioning is used, it could be very difficult
to debug.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/d2r/storm storm-138-byte-array-hashcode
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/641.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #641
----
commit 46b8c9e71b4b9b3f0e8c762fe070793e3f6d4971
Author: Derek Dagit <[email protected]>
Date: 2015-07-17T21:54:32Z
Correctly hash byte array tuple values
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---