GitHub user roshannaik opened a pull request: https://github.com/apache/storm/pull/2241
STORM-2306 : Messaging subsystem redesign. Having spent a lot of time on this, I am happy to share some good news and some even better news with you. Before venturing further, I must add, to limit the scope of this PR, no attempt was made to improve ACK-ing mode perf. Although there are some big latency gains seen in ACK mode, these are a side effect of the new messaging design and work remains to be done to improve ACK mode. Please see the design docs posted on the STORM-2306 jira for details on what is being done So, first the good news .. a quick competitive evaluation: # 1) Competitive Perf evaluation : Here are some quick comparison of Storm numbers taken on my laptop against numbers for similar/identical topologies published by Heron, Flink and Apex. Shall provide just rolled up summary here and leave the detailed analysis for later. Storm numbers here were run on my MacBook Pro (2015) with 16GB ram and a single 4 core Intel i7 chip. ### A) Compared To Heron and Flink: ------------------------------ Heron recently published this blog about the big perf improvements (~4-6x) they achieved. https://blog.twitter.com/engineering/en_us/topics/open-source/2017/optimizing-twitter-heron.html They ran it on dual 12-core Intel Xeon chips (didn't say how many machines). They use a simplified word count topology that I have emulated for comparison purposes and included it as part of this PR (SimplifiedWordCountTopo). Flink also publishes numbers for a similar setup here https://flink.apache.org/features.html#streaming Below are per core throughput numbers. **[:HERON:]** Acking Disabled: per core = **~475 k/sec**. Acking Enabled: per core = **~150 k/sec**. Latency = **30ms** **[:FLINK:]** Per core: **~1 mill/sec** **[:STORM:]** Acking Disabled: per core = **2 mill/sec.** (1 spout & 1 counter bolt) Acking Enabled: per core = **0.6 mill/sec**, Latency = **0.73 ms** (+1 acker) **Takeaways:** - Storm's with-ACK throughput is better than Heron's no-ACK throughput. - Without ACKs, Storm is far ahead of Heron and also better than Flink. - Storm's Latency (in microseconds) is also significantly better than both (although this metric is better to compared with multiple machines in the run). AFAIKT, Flink is generally not known for having good latency (as per Flink's own comparison with Storm - https://data-artisans.com/blog/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink). ### B) Compared to Apex: ----------------------------------- Apex appears to be the best performer among the opensource lot.. by a reasonably good margin. Some numbers they published in their early days (when it was called DataTorrent) were misleading/dubious IMO, but the newer numbers appear credible. Here we look at how fast inter spout/bolt communication can be achieved using an ultra minimalist topology. A ConstSpout emits a short string to a DevNull bolt that discards the tuples it receives. This topo has been in storm-perf for sometime now. Apex provides numbers for a identical setup ... what they call "container local" performance here: https://www.datatorrent.com/blog/blog-apex-performance-benchmark/ Other than the fact that Storm numbers were on my laptop, these numbers are a good apples to apples comparison. **[:APEX:]** Container local Throughput : **~4.2 mill/sec** **[:STORM:]** Worker local throughput : **8.1 mill/sec** # 2) Core messaging Performance Now for the better news. The redesigned messaging system is actually much faster and able to move messages between threads at an astounding rate .... : - **120 mill/sec** (batchSz=1000, 2 producers writing to 1 consumer). - **67 mill/sec** (batchSz=1000, 1 producers writing to 1 consumer). I have included JCQueuePerfTest.java in this PR to help get quick measurements from within the IDE. That naturally begs the question .. why is Storm pushing only 8.1 mill/sec between a ConstSpout and DevNullBolt ? The short answer is ... there are big bottlenecks in other parts of the code. In this PR I have tackled some such bottlenecks but many still remain. We are faster than the competition, but still have room to be much much faster. If anyone is interested in pursuing these to push Storm's perf to the next level, I am happy to point them in the right direction. Again, please refer to the design docs in the JIRA for details on the new design and the rationale behind them. You can merge this pull request into a Git repository by running: $ git pull https://github.com/roshannaik/storm STORM-2306m Alternatively you can review and apply these changes as the patch at: https://github.com/apache/storm/pull/2241.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #2241 ---- commit 5c0db923ecd8e4e1ce0e325ee2fd0f25bae7b0c2 Author: Roshan Naik <roshan@hw13642.local> Date: 2017-07-25T03:01:00Z Messaging susbsytem redesign. Rebased to latest master. Validated compilation and few simple topo runs. C->N: 8 mil/sec. 2.8 mil/sec with 2 workers. 1 mil/sec with ack (30-800 micosec). C->ID->N: 6.2mill/sec. ---- --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---