Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/storm/pull/2558#discussion_r168330302
--- Diff: storm-client/src/jvm/org/apache/storm/utils/TupleUtils.java ---
@@ -41,7 +41,7 @@ public static boolean isTick(Tuple tuple) {
}
public static <T> int chooseTaskIndex(List<T> keys, int numTasks) {
- return Math.abs(listHashCode(keys)) % numTasks;
+ return Math.abs(listHashCode(keys) % numTasks);
--- End diff --
Since we are now on JDK 1.8, Math.floorMod(listHashCode(keys), numTasks)
would be right implementation to make distribution be consistent with 1.x
version line.
https://github.com/apache/storm/blob/a689d78f81e8783b442fc96ef2cbc7295c7a931d/storm-core/src/clj/org/apache/storm/daemon/executor.clj#L48-L56
https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#floorMod-int-int-
https://clojuredocs.org/clojure.core/mod
`Math.abs(listHashCode(keys) % numTasks)` !=
`Math.floorMod(listHashCode(keys), numTasks)`, but
`Math.floorMod(listHashCode(keys), numTasks)` == `(mod listHashCode(keys)
numTasks)`.
Yes your implementation would be right about guaranteeing positive index
and might be faster, but namespace in states (being used for stateful bolts)
has `task id` hence changing hash function requires state resharding. Maybe
worth to see how much the difference is? We should provide state resharding
anyway to support the change of task parallelism. (though changing hash
function requires all of end-users to run it once.)
---