[ 
https://issues.apache.org/jira/browse/STORM-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976604#comment-14976604
 ] 

ASF GitHub Bot commented on STORM-350:
--------------------------------------

Github user revans2 commented on the pull request:

    https://github.com/apache/storm/pull/797#issuecomment-151548940
  
    @HeartSaVioR I am still not seeing any errors.
    
    I just pushed the exact version of the code that I tested with 
dfba63e6e3768085a3545cf70c78993586409a25 I created 3 VMs that are similar to 
yours, but with less memory because my VM quota on our IaaS didn't have that 
much memory left.
    
    ```
    $ cat /etc/redhat-release
    Red Hat Enterprise Linux Server release 6.3 (Santiago)
    $ java -version
    java version "1.7.0_51"
    $ cat /proc/cpuinfo
    processor       : 0
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 13
    model name      : QEMU Virtual CPU version (cpu64-rhel6)
    stepping        : 3
    cpu MHz         : 2094.950
    cache size      : 4096 KB
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 4
    wp              : yes
    flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
pse36 clflush mmx fxsr sse sse2 syscall nx lm unfair_spinlock pni cx16 
hypervisor lahf_lm
    bogomips        : 4189.90
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 46 bits physical, 48 bits virtual
    power management:
    
    processor       : 1
    vendor_id       : GenuineIntel
    cpu family      : 6
    model           : 13
    model name      : QEMU Virtual CPU version (cpu64-rhel6)
    stepping        : 3
    cpu MHz         : 2094.950
    cache size      : 4096 KB
    fpu             : yes
    fpu_exception   : yes
    cpuid level     : 4
    wp              : yes
    flags           : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov 
pse36 clflush mmx fxsr sse sse2 syscall nx lm unfair_spinlock pni cx16 
hypervisor lahf_lm
    bogomips        : 4189.90
    clflush size    : 64
    cache_alignment : 64
    address sizes   : 46 bits physical, 48 bits virtual
    power management:
    ```
    
    I modified the perf-test to also print the number of failed tuples so I 
would not have to keep refreshing the UI all the time. I just pushed that to 
the perf-test repo 
https://github.com/yahoo/storm-perf-test/commit/96d65792be02d9d6f203dc20c329a40e6a9302f0
    
    I ran the same command multiple times, each with a very similar result.
    
    ```
    908  [main] WARN  b.s.u.NimbusClient - Using deprecated config nimbus.host 
for backward compatibility. Please update your storm.yaml so it only has config 
nimbus.seeds
    1084 [main] INFO  c.y.s.p.Main - Adding in 3 spouts
    1126 [main] INFO  c.y.s.p.Main - Adding in 4 bolts
    1267 [main] INFO  b.s.StormSubmitter - Generated ZooKeeper secret payload 
for MD5-digest: -5117308553277668052:-5039407269887585982
    1273 [main] INFO  b.s.s.a.AuthUtils - Got AutoCreds []
    1273 [main] WARN  b.s.u.NimbusClient - Using deprecated config nimbus.host 
for backward compatibility. Please update your storm.yaml so it only has config 
nimbus.seeds
    1310 [main] WARN  b.s.u.NimbusClient - Using deprecated config nimbus.host 
for backward compatibility. Please update your storm.yaml so it only has config 
nimbus.seeds
    1361 [main] WARN  b.s.u.NimbusClient - Using deprecated config nimbus.host 
for backward compatibility. Please update your storm.yaml so it only has config 
nimbus.seeds
    1403 [main] INFO  b.s.StormSubmitter - Uploading topology jar 
./storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar to assigned 
location: 
/home/ME/apache-storm-0.11.0-SNAPSHOT/storm-local/nimbus/inbox/stormjar-e1d3ca41-9ffe-466d-81dd-33e7f90b8367.jar
    1420 [main] INFO  b.s.StormSubmitter - Successfully uploaded topology jar 
to assigned location: 
/home/ME/apache-storm-0.11.0-SNAPSHOT/storm-local/nimbus/inbox/stormjar-e1d3ca41-9ffe-466d-81dd-33e7f90b8367.jar
    1420 [main] INFO  b.s.StormSubmitter - Submitting topology test_0 in 
distributed mode with conf 
{"storm.zookeeper.topology.auth.scheme":"digest","storm.zookeeper.topology.auth.payload":"-5117308553277668052:-5039407269887585982","topology.workers":3,"topology.acker.executors":3,"topology.debug":false,"topology.max.spout.pending":1092}
    1530 [main] INFO  b.s.StormSubmitter - Finished submitting topology: test_0
    status  topologies      totalSlots      slotsUsed       totalExecutors  
executorsWithMetrics    time    time-diff ms    transferred     throughput 
(MB/s)       total Failed
    WAITING 1       12      0       0       0       1445955023986   0       0   
    0.0     0
    WAITING 1       12      3       13      10      1445955053986   30000   
69560   0.022112528483072918    0
    WAITING 1       12      3       13      10      1445955083986   30000   
1012280 0.32179514567057294     0
    WAITING 1       12      3       13      11      1445955113986   30000   
1346940 0.4281806945800781      0
    WAITING 1       12      3       13      11      1445955143986   30000   
1419380 0.45120875040690106     0
    WAITING 1       12      3       13      11      1445955173986   30000   
1315180 0.41808446248372394     0
    WAITING 1       12      3       13      11      1445955203987   30001   
1324780 0.4211221828901276      0
    WAITING 1       12      3       13      11      1445955233986   29999   
1377400 0.43787826374811456     0
    WAITING 1       12      3       13      11      1445955263986   30000   
1432580 0.45540491739908856     0
    WAITING 1       12      3       13      11      1445955293987   30001   
789260  0.25089063396780004     0
    WAITING 1       12      3       13      11      1445955323987   30000   
1307960 0.41578928629557294     0
    WAITING 1       12      3       13      12      1445955353986   29999   
1420280 0.451509903031924       0
    ...
    WAITING 1       12      3       13      13      1445955833986   30000   
1330760 0.42303721110026044     0
    RUNNING 1       12      3       13      13      1445955863986   30000   
1421300 0.45181910196940106     0
    RUNNING 1       12      3       13      13      1445955893986   30000   
1403780 0.44624964396158856     0
    RUNNING 1       12      3       13      13      1445955923986   30000   
1325140 0.4212506612141927      0
    RUNNING 1       12      3       13      13      1445955953986   30000   
1328120 0.42219797770182294     0
    RUNNING 1       12      3       13      13      1445955983986   30000   
1338720 0.425567626953125       0
    RUNNING 1       12      3       13      13      1445956013986   30000   
1396000 0.4437764485677083      0
    RUNNING 1       12      3       13      13      1445956043986   30000   
1400840 0.44531504313151044     0
    RUNNING 1       12      3       13      13      1445956073987   30001   
1417260 0.4505198032298663      0
    RUNNING 1       12      3       13      13      1445956103986   29999   
1238620 0.393759819256345       0
    RUNNING 1       12      3       13      13      1445956133986   30000   
1400860 0.44532140096028644     0
    RUNNING 1       12      3       13      13      1445956163986   30000   
1429200 0.4543304443359375      0
    RUNNING 1       12      3       13      13      1445956193986   30000   
1339820 0.4259173075358073      0
    RUNNING 1       12      3       13      13      1445956223986   30000   
1432580 0.45540491739908856     0
    RUNNING 1       12      3       13      13      1445956253986   30000   
1373020 0.43647130330403644     0
    RUNNING 1       12      3       13      13      1445956283986   30000   
1380660 0.4388999938964844      0
    RUNNING 1       12      3       13      13      1445956313986   30000   
1489820 0.4736010233561198      0
    RUNNING 1       12      3       13      13      1445956343986   30000   
1414680 0.44971466064453125     0
    RUNNING 1       12      3       13      13      1445956373987   30001   
1403300 0.44608218666474136     0
    RUNNING 1       12      3       13      13      1445956403986   29999   
1357460 0.4315392971595147      0
    RUNNING 1       12      3       13      13      1445956433986   30000   
1418080 0.4507954915364583      0
    RUNNING 1       12      3       13      13      1445956463986   30000   
1361900 0.4329363505045573      0
    RUNNING 1       12      3       13      13      1445956493987   30001   
1376360 0.43751847676041006     0
    RUNNING 1       12      3       13      13      1445956523987   30000   
979760  0.3114573160807292      0
    RUNNING 1       12      3       13      13      1445956553986   29999   
1476040 0.4692361205334449      0
    RUNNING 1       12      3       13      13      1445956583986   30000   
1394800 0.4433949788411458      0
    RUNNING 1       12      3       13      13      1445956613986   30000   
1454180 0.46227137247721356     0
    RUNNING 1       12      3       13      13      1445956643986   30000   
1420840 0.45167287190755206     0
    RUNNING 1       12      3       13      13      1445956673986   30000   
1478660 0.47005335489908856     0
    RUNNING 1       12      3       13      13      1445956703987   30001   
1402460 0.4458151667568112      0
    RUNNING 1       12      3       13      13      1445956733986   29999   
1386660 0.44082203659718344     0
    1741535 [main] INFO  c.y.s.p.Main - KILLING test_0
    ```
    
    It takes a very long time for the event logger bolt to send enough metrics 
that the code registers that it is up and starts the test, so each test usually 
lasts about 30 mins.  I got no failures anywhere during the testing.
    The following numbers were all collected on the 
nimbus/ui/zk/supervisor/logviewer node.  I also launched the topology from that 
node, so it would probably be the most overloaded.  All the other nodes were 
running a supervisor/logviewer/zk
    
    ```
    top - 14:29:54 up 57 min,  4 users,  load average: 5.21, 4.50, 3.23
    Tasks: 132 total,   2 running, 130 sleeping,   0 stopped,   0 zombie
    Cpu(s): 48.0%us, 42.2%sy,  1.7%ni,  0.2%id,  0.5%wa,  0.0%hi,  7.3%si,  
0.2%st
    Mem:   4055232k total,  3161188k used,   894044k free,   123084k buffers
    ```
    
    This is the output of running zktop.py against the zk ensemble (IP 
addresses and Host names changed).
    ```
    Ensemble -- nodecount:32 zxid:0x10000086b sessions:16
    
    ID SERVER           PORT M    OUTST    RECVD     SENT CONNS MINLAT AVGLAT 
MAXLAT
    0  NIMBUS  2181 F        0     3895     3901     5      0      1     33
    1  SUPER1  2181 L        0     2054     2054     3      0      1     22
    2  SUPER2  2181 F        0    11358    11376     8      0      0     29
    
    CLIENT           PORT S I   QUEUED    RECVD     SENT
    NIMBUS   54133 0 1        0      330      330
    NIMBUS   54629 0 1        0     1313     1313
    NIMBUS  55899 0 0        0        1        0
    SUPER1    41270 0 1        0     1305     1305
    NIMBUS   54130 0 1        0      511      511
    NIMBUS   49674 1 1        0      357      357
    NIMBUS   51440 1 0        0        1        0
    SUPER2   60290 1 1        0     1305     1305
    SUPER2   51100 2 1        0     1147     1151
    NIMBUS   53264 2 1        0     1148     1152
    NIMBUS   53088 2 1        0     7937     7940
    NIMBUS   54851 2 0        0        1        0
    SUPER1    60972 2 1        0      216      216
    NUMBUS   53263 2 1        0      217      217
    SUPER1    60973 2 1        0     1148     1152
    SUPER2   51098 2 1        0      216      216
    ```
    
    Everything there looks OK.
    
    Disk utilization while the test is running is between 0.6% and 0.7%.  
Network maxed out at 23054.9 kbps in and 21443.2 kbps out.  Gigabit should be 
able to handle theoretically 128 MB/sec in and out so the 22/23 MB/sec should 
totally be within that range.  If somehow the connection slipped to 100 Mbit 
then this would possibly explain what you are seeing, but it is just a guess.
    
    I have screen shots of the UI too, but I don't think it will add much.  I 
am now going to try and turn on the logging metrics consumer and see if I can 
get a measure for the GC/etc as another possible issue.
    
    storm.yaml
    ```
    worker.childopts: "-Dfile.encoding=UTF-8 -Xmx768m 
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof 
-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=1%ID% 
-Dcom.sun.management.jmxremote.ssl=false 
-Dcom.sun.management.jmxremote.authenticate=false"
    topology.backpressure.enable: false
    topology.transfer.buffer.size: 32
    topology.executor.send.buffer.size: 16384
    topology.executor.receive.buffer.size: 16384
    
    storm.zookeeper.servers:
      - "NIMBUS"
      - "SUPER1"
      - "SUPER2"
    
    nimbus.host: "NIMBUS"
    ```
    
    command line I ran
    ```
    ./apache-storm-0.11.0-SNAPSHOT/bin/storm jar 
./storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar 
com.yahoo.storm.perftest.Main --ack --ackers 3 --bolt 4 --name test -l 1 -n 1 
--workers 3 --spout 3 --testTimeSec 900 -c topology.max.spout.pending=1092 
--messageSize 10 | tee run.txt
    ```


> Update disruptor to latest version (3.2.1)
> ------------------------------------------
>
>                 Key: STORM-350
>                 URL: https://issues.apache.org/jira/browse/STORM-350
>             Project: Apache Storm
>          Issue Type: Dependency upgrade
>          Components: storm-core
>            Reporter: Boris Aksenov
>            Assignee: Boris Aksenov
>            Priority: Minor
>             Fix For: 0.10.0
>
>         Attachments: 
> 20141117-0.9.3-rc1-3-worker-separate-1-spout-and-2-bolts-failing-tuples.png, 
> 20141117-0.9.3-rc1-one-worker-failing-tuples.png, 
> 20141117-0.9.3-rc1-three-workers-1-spout-3-bolts-failing-tuples.png, 
> 20141118-0.9.3-branch-3-worker-separate-1-spout-and-2-bolts-ok.png, 
> 20141118-0.9.3-branch-one-worker-ok.png, 
> 20141118-0.9.3-branch-three-workers-1-spout-3-bolts-ok.png, Storm UI1.pdf, 
> Storm UI2.pdf, storm-0.9.3-rc1-failing-tuples.png, 
> storm-0_9_2-incubating-failing-tuples.png, 
> storm-0_9_2-incubating-no-failing-tuples.png, 
> storm-failed-tuples-multi-node.png, storm-multi-node-without-350.png
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

Reply via email to