GitHub user d2r reopened a pull request:
https://github.com/apache/storm/pull/641
[STORM-139] Correctly hash byte array tuple values
* Unit test for correctness is included
* The following tests showed no discernible difference in latency or
throughput:
* word_count, last 10min stats after being running > 10min
* word_count modified to send byte[] words instead of Strings, same
conditions
This should handle the kinds of Objects we care about: reference types, all
primitive array types, and reference array types.
Using Java's .hashCode is a problem for distributed systems, as was pointed
out in the JIRA.
However, the behavior of
[String#hashCode](https://docs.oracle.com/javase/7/docs/api/java/lang/String.html#hashCode%28%29),
and the implementation of
[java.util.Arrays#hashCode](http://grepcode.com/file/repository.grepcode.com/java/root/jdk/openjdk/6-b14/java/util/Arrays.java#Arrays.hashCode%28float[]%29)
for all primitive array types is defined and should be consistent across JVMs.
This leaves Objects and Object[] that are not Strings. In these cases, this
change still trusts the hashCode method provided in those classes.
In practice, bad cases should be rare. In the cases when hashing is
inconsistent and some sort of partitioning is used, it could be very difficult
to debug.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/d2r/storm storm-138-byte-array-hashcode
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/storm/pull/641.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #641
----
commit b132520adb00def6d03ef020b1cb5973eb3b3519
Author: Derek Dagit <[email protected]>
Date: 2015-07-17T23:45:39Z
Correctly hash byte array tuple values
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---