Github user HeartSaVioR commented on a diff in the pull request:

    https://github.com/apache/storm/pull/2558#discussion_r168330302
  
    --- Diff: storm-client/src/jvm/org/apache/storm/utils/TupleUtils.java ---
    @@ -41,7 +41,7 @@ public static boolean isTick(Tuple tuple) {
         }
     
         public static <T> int chooseTaskIndex(List<T> keys, int numTasks) {
    -        return Math.abs(listHashCode(keys)) % numTasks;
    +        return Math.abs(listHashCode(keys) % numTasks);
    --- End diff --
    
    Since we are now on JDK 1.8, Math.floorMod(listHashCode(keys), numTasks) 
would be right implementation to make distribution be consistent with 1.x 
version line.
    
    
https://github.com/apache/storm/blob/a689d78f81e8783b442fc96ef2cbc7295c7a931d/storm-core/src/clj/org/apache/storm/daemon/executor.clj#L48-L56
    
    
https://docs.oracle.com/javase/8/docs/api/java/lang/Math.html#floorMod-int-int-
    https://clojuredocs.org/clojure.core/mod
    
    `Math.abs(listHashCode(keys) % numTasks)` != 
`Math.floorMod(listHashCode(keys), numTasks)`, but 
`Math.floorMod(listHashCode(keys), numTasks)` == `(mod listHashCode(keys) 
numTasks)`.
    
    Yes your implementation would be right about guaranteeing positive index 
and might be faster, but namespace in states (being used for stateful bolts) 
has `task id` hence changing hash function requires state resharding. Maybe 
worth to see how much the difference is? We should provide state resharding 
anyway to support the change of task parallelism. (though changing hash 
function requires all of end-users to run it once.)


---

Reply via email to