GitHub user HeartSaVioR opened a pull request: https://github.com/apache/storm/pull/2261
STORM-2678 Improve performance of LoadAwareShuffleGrouping * construct ring which represents distribution of tasks based on load * chooseTasks() just accesses the ring sequentially * port related tests from Clojure to Java I'm not expert of micro-benchmark but I also craft some of simple performance tests which you can see them from LoadAwareShuffleGroupingTest. They are `testBenchmarkLoadAwareShuffleGroupingEvenLoad` and `testBenchmarkLoadAwareShuffleGroupingUnevenLoad `, and I put `@Ignore` to avoid running unless we want to do performance test on. Here's my observation on running them, using old and new LoadAwareShuffleGrouping: > testBenchmarkLoadAwareShuffleGroupingEvenLoad (old) Duration: 114470 ms Duration: 115973 ms Duration: 114807 ms > testBenchmarkLoadAwareShuffleGroupingEvenLoad (new) Duration: 106819 ms Duration: 105857 ms Duration: 106789 ms > testBenchmarkLoadAwareShuffleGroupingUnevenLoad (old) Duration: 113484 ms Duration: 118152 ms Duration: 112664 ms > testBenchmarkLoadAwareShuffleGroupingUnevenLoad (new) Duration: 106071 ms Duration: 105938 ms Duration: 106115 ms You can see that modified LoadAwareShuffleGrouping is faster than before, 5% or more for single threaded access. Maybe would want to do multi-threading performance test, with keeping in mind that accessing OutputCollector with single-thread is preferred over multi-threads. This still respects thread-safety, and I think respecting thread-safety is better than before, given that we only allow one thread to update the ring, and we replace the new information at once, not updating information on the fly which other threads are referencing. We still don't guard information with mutual-exclusion manner, but I think it is tolerable like we do before. I'm planning to explore some more, mostly about reducing call System.currentTimeMillis() in chooseTasks(). I'll put additional commits if I find any more improvements: it will be easy to revert some if we don't want to. You can merge this pull request into a Git repository by running: $ git pull https://github.com/HeartSaVioR/storm STORM-2678 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/2261.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2261 ---- commit 6c693e3a2d57cca3648240335ef6e1862dbbfc4c Author: Jungtaek Lim <kabh...@gmail.com> Date: 2017-08-04T04:25:27Z STORM-2678 Improve performance of LoadAwareShuffleGrouping * construct ring which represents distribution of tasks based on load * chooseTasks() just accesses the ring sequentially * port related tests from Clojure to Java ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---