GitHub user roshannaik opened a pull request:

    https://github.com/apache/storm/pull/2241

    STORM-2306 : Messaging subsystem redesign. 

    Having spent a lot of time on this, I am happy to share some good news and 
some even better news with you. 
    
    Before venturing further, I must add, to limit the scope of this PR, no 
attempt was made to improve ACK-ing mode perf. Although there are some big 
latency gains seen in ACK mode, these are a side effect of the new messaging 
design and work remains to be done to improve ACK mode.   
    
    Please see the design docs posted on the STORM-2306 jira for details on 
what is being done
    
    So, first the good news .. a quick competitive evaluation:
    
    # 1) Competitive Perf evaluation : 
    
    Here are some quick comparison of Storm numbers taken on my laptop against 
numbers for similar/identical topologies published by Heron, Flink and Apex. 
Shall provide just rolled up summary here and leave the detailed analysis for 
later.
    
    Storm numbers here were run on my MacBook Pro (2015) with 16GB ram and a 
single 4 core Intel i7 chip.
    
    
    ### A) Compared To Heron and Flink:
    ------------------------------
     Heron recently published this blog about the big perf improvements (~4-6x) 
they achieved.
    
https://blog.twitter.com/engineering/en_us/topics/open-source/2017/optimizing-twitter-heron.html
    They ran it on dual 12-core Intel Xeon chips (didn't say how many machines).
    
    They use a simplified word count topology that I have emulated for 
comparison purposes and included it as part of this PR 
(SimplifiedWordCountTopo).
    
    Flink also publishes numbers for a similar setup here 
    https://flink.apache.org/features.html#streaming
    
    Below are per core throughput numbers.
    
    **[:HERON:]**
    Acking Disabled: per core = **~475 k/sec**.
    Acking Enabled:  per core = **~150 k/sec**. Latency = **30ms**
    
    **[:FLINK:]**
    Per core: **~1 mill/sec**
    
    **[:STORM:]**
    Acking Disabled: per core =   **2 mill/sec.**    (1 spout & 1 counter bolt)
    Acking Enabled:  per core = **0.6 mill/sec**, Latency = **0.73 ms**  (+1 
acker)
    
    
    **Takeaways:** 
    - Storm's with-ACK throughput is better than Heron's no-ACK throughput. 
    - Without ACKs, Storm is far ahead of Heron and also better than Flink. 
    - Storm's Latency (in microseconds) is also significantly better than both 
(although this metric is better to compared with multiple machines in the run). 
AFAIKT, Flink is generally not known for having good latency (as per Flink's 
own comparison with Storm -  
https://data-artisans.com/blog/high-throughput-low-latency-and-exactly-once-stream-processing-with-apache-flink).
    
    
    
    ### B) Compared to Apex:
    -----------------------------------
    Apex appears to be the best performer among the opensource lot.. by a 
reasonably good margin. Some numbers they published in their early days (when 
it was called DataTorrent) were misleading/dubious IMO, but the newer numbers 
appear credible.
    
    Here we look at how fast inter spout/bolt communication can be achieved 
using an ultra minimalist topology.
    A ConstSpout emits a short string to a DevNull bolt that discards the 
tuples it receives. This topo has been in storm-perf for sometime now.
    
    Apex provides numbers for a identical setup ... what they call "container 
local" performance here:
    https://www.datatorrent.com/blog/blog-apex-performance-benchmark/
    
    Other than the fact that Storm numbers were on my laptop, these numbers are 
a good apples to apples comparison.
    
    **[:APEX:]**
    Container local Throughput : **~4.2 mill/sec**
    
    **[:STORM:]**
    Worker local throughput : **8.1 mill/sec**
    
    
    
    # 2) Core messaging Performance
    
    Now for the better news. The redesigned messaging system is actually much 
faster and able to move messages between threads at an astounding rate .... :
    
    - **120 mill/sec**  (batchSz=1000, 2 producers writing to 1 consumer).
    - **67 mill/sec**   (batchSz=1000, 1 producers writing to 1 consumer).
    
    
    I have included JCQueuePerfTest.java in this PR to help get quick 
measurements from within the IDE.
    
    That naturally begs the question .. why is Storm pushing only 8.1 mill/sec 
between a ConstSpout and DevNullBolt ? The short answer is ... there are big 
bottlenecks in other parts of the code. In this PR I have tackled some such 
bottlenecks but many still remain. We are faster than the competition, but 
still have room to be much much faster. If anyone is interested in pursuing 
these to push Storm's perf to the next level, I am happy to point them in the 
right direction.
    
    
    Again, please refer to the design docs in the JIRA for details on the new 
design and the rationale behind them.
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/roshannaik/storm STORM-2306m

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/storm/pull/2241.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #2241
    
----
commit 5c0db923ecd8e4e1ce0e325ee2fd0f25bae7b0c2
Author: Roshan Naik <roshan@hw13642.local>
Date:   2017-07-25T03:01:00Z

    Messaging susbsytem redesign. Rebased to latest master. Validated 
compilation and few simple topo runs. C->N: 8 mil/sec. 2.8 mil/sec with 2 
workers. 1 mil/sec with ack (30-800 micosec). C->ID->N: 6.2mill/sec.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

Reply via email to