[GitHub] incubator-storm pull request: STORM-297 Storm Performance cannot b...

clockfly Tue, 03 Jun 2014 22:28:19 -0700

Github user clockfly commented on the pull request:

    https://github.com/apache/incubator-storm/pull/103#issuecomment-45052822
  
    Hi @Gvain,
    
    First thank you for your insistence! it really helps to gain more 
understanding of storm.  
    I run with your approach, the result verified your saying, the throughput 
does increase as worker number increase.
    
    Test
    -------------
    
    **Throughput vs. worker# â **
    
    | Worker# |cluster network IN(MB/s)|spout throughput(msg/s)|overall 
CPU|user cpu|system cpu|total threads#â¡ |
    | ------------- |:-------------:| -----:|-----:|-----:|-----:|-----:|-----:|
    | 4 | 72 | 391860 | 42% | 36% | 6%| 422 |
    | 8 | 88.3 | 475007 | 42% | 35%     | 7%    | 554 | 
    | 12 |      104 | 555174 | 51% | 40% | 11% | 686 |
    | 16 | 116 | 603399 | 59% | 46% | 13% | 818 |  
    | 24 |      130 | 622938 | 77% | 55% | 22% | 1082 | 
    | 4(storm 297) |    140 |   752479 | 74% | 67.80% | 5.20% | 434 |
    
    â  Test environment: node=4, 48 vcore on each machine, max.spout.pendings 
= 1000, CPU: E52680, 48 spout, 48 bolt, and 48 ackers.
    â¡ We will only count 1 gc thread 1 jit thread for each JVM.
    
    
    
    
![worker_throughput_without_storm-297](https://cloud.githubusercontent.com/assets/2595532/3169502/82447702-eb9e-11e3-9f97-bc29fde190f3.png)
    
    
    **CPU Usage**
    
    ![cpu_worker 
_scability](https://cloud.githubusercontent.com/assets/2595532/3169503/afdac11c-eb9e-11e3-9207-9d524606f613.png)
    
    We can find out both throughput and CPU usage increase when worker number 
increase.
    
    Analysis
    -------------------
    The facts revealed by this test strengthened my conviction that it is even 
better to apply this patch for higher performance:
    
    1. **Higher performance with this patch.**
    
      | | Worker# |cluster network IN(MB/s)|spout through put(msg/s)|
    | ------------- |:-------------:| -----:|-----:|-----:|
    | no storm-297 | 24 |       130 | 622938 |
    | with storm 297 | 4 |      140 |   752479 |
    
      **4 worker** with storm-297 can process **20% more** message than **24 
workers** without this patch, with **less** CPU consumption.
    
    2. **Storm cannot scale well by changing task parallism solely**
    
      As the data in your test showed, for 4 worker, we can only reach 56% CPU. 
with the facts that there are 36 task parallism for each worker, much lagger 
than CPU core# 24.
    
      **The worker has inherit bottlenecks, ths issues are there. Work-arounds 
won't fix those issues.**
    
    3. **CPU System time increase when worker number increase**
    
      | Worker# |user cpu|system cpu|
    | ------------- | -----:|-----:|
    | 4 | 36% | 6%| 
    | 8 | 35%   | 7%    |
    | 12 |      40% | 11% |
    | 16 | 46% | 13% |  
    | 24 |      55% | **22%** |
    | 4(with storm-297) | 67.80% | 5.20% |
    
      Unnormal high system CPU is not good.
    
    4. **JVM allocation cost is high.**
      In the test, there are 16+ threads running for each JVM allocation. The 
supporting data can be found at  
    
    5. **Worker allocation is not cost free**
    
     Besides the JVM allocation cost, there are zookeeper related threads(4), 
hearbeat threads(5), system bolt(2), netty boss and worker threads(6, 2 boss, 2 
worker, and 2 timer). Plus with JVM threads, each worker will add at least 33 
threads. 
    
      More threads will add more pressure to central service like nimbus, and 
zookeeper.
    
      More threads means more context switch, it will hurt the performance of 
all applications running on these server. 
    
      | Worker# |cluster total threads#|
    | ------------- |:-------------:|
    | 4 | 422 |
    | 8 | 554 | 
    | 12 | 686 |
    | 16 | 818 |  
    | 24 | **1082** | 
    | 4(with storm-297) | 434 |
    (We will only count 1 gc thread 1 jit thread for each JVM) 
    
      For 24 workers case, the overall thread number is 1082+, with **each 
machine having more than 270 threads!**
    
    
    6. **Serialization and Deserialization Cost**
    
      When the message is delivered from the task in the same process, the 
tuple won't be serialized. 
    
      When there are 4 worker(suppose shuffle grouping and task is even 
distributed), there are 1/4 message that don't need serialization, but for 24 
workers, the ratio is 1/24, which means we are now need to serialize 28% more 
messages.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] incubator-storm pull request: STORM-297 Storm Performance cannot b...

Reply via email to