[
https://issues.apache.org/jira/browse/STORM-350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14976604#comment-14976604
]
ASF GitHub Bot commented on STORM-350:
--------------------------------------
Github user revans2 commented on the pull request:
https://github.com/apache/storm/pull/797#issuecomment-151548940
@HeartSaVioR I am still not seeing any errors.
I just pushed the exact version of the code that I tested with
dfba63e6e3768085a3545cf70c78993586409a25 I created 3 VMs that are similar to
yours, but with less memory because my VM quota on our IaaS didn't have that
much memory left.
```
$ cat /etc/redhat-release
Red Hat Enterprise Linux Server release 6.3 (Santiago)
$ java -version
java version "1.7.0_51"
$ cat /proc/cpuinfo
processor : 0
vendor_id : GenuineIntel
cpu family : 6
model : 13
model name : QEMU Virtual CPU version (cpu64-rhel6)
stepping : 3
cpu MHz : 2094.950
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pse36 clflush mmx fxsr sse sse2 syscall nx lm unfair_spinlock pni cx16
hypervisor lahf_lm
bogomips : 4189.90
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
processor : 1
vendor_id : GenuineIntel
cpu family : 6
model : 13
model name : QEMU Virtual CPU version (cpu64-rhel6)
stepping : 3
cpu MHz : 2094.950
cache size : 4096 KB
fpu : yes
fpu_exception : yes
cpuid level : 4
wp : yes
flags : fpu de pse tsc msr pae mce cx8 apic mtrr pge mca cmov
pse36 clflush mmx fxsr sse sse2 syscall nx lm unfair_spinlock pni cx16
hypervisor lahf_lm
bogomips : 4189.90
clflush size : 64
cache_alignment : 64
address sizes : 46 bits physical, 48 bits virtual
power management:
```
I modified the perf-test to also print the number of failed tuples so I
would not have to keep refreshing the UI all the time. I just pushed that to
the perf-test repo
https://github.com/yahoo/storm-perf-test/commit/96d65792be02d9d6f203dc20c329a40e6a9302f0
I ran the same command multiple times, each with a very similar result.
```
908 [main] WARN b.s.u.NimbusClient - Using deprecated config nimbus.host
for backward compatibility. Please update your storm.yaml so it only has config
nimbus.seeds
1084 [main] INFO c.y.s.p.Main - Adding in 3 spouts
1126 [main] INFO c.y.s.p.Main - Adding in 4 bolts
1267 [main] INFO b.s.StormSubmitter - Generated ZooKeeper secret payload
for MD5-digest: -5117308553277668052:-5039407269887585982
1273 [main] INFO b.s.s.a.AuthUtils - Got AutoCreds []
1273 [main] WARN b.s.u.NimbusClient - Using deprecated config nimbus.host
for backward compatibility. Please update your storm.yaml so it only has config
nimbus.seeds
1310 [main] WARN b.s.u.NimbusClient - Using deprecated config nimbus.host
for backward compatibility. Please update your storm.yaml so it only has config
nimbus.seeds
1361 [main] WARN b.s.u.NimbusClient - Using deprecated config nimbus.host
for backward compatibility. Please update your storm.yaml so it only has config
nimbus.seeds
1403 [main] INFO b.s.StormSubmitter - Uploading topology jar
./storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar to assigned
location:
/home/ME/apache-storm-0.11.0-SNAPSHOT/storm-local/nimbus/inbox/stormjar-e1d3ca41-9ffe-466d-81dd-33e7f90b8367.jar
1420 [main] INFO b.s.StormSubmitter - Successfully uploaded topology jar
to assigned location:
/home/ME/apache-storm-0.11.0-SNAPSHOT/storm-local/nimbus/inbox/stormjar-e1d3ca41-9ffe-466d-81dd-33e7f90b8367.jar
1420 [main] INFO b.s.StormSubmitter - Submitting topology test_0 in
distributed mode with conf
{"storm.zookeeper.topology.auth.scheme":"digest","storm.zookeeper.topology.auth.payload":"-5117308553277668052:-5039407269887585982","topology.workers":3,"topology.acker.executors":3,"topology.debug":false,"topology.max.spout.pending":1092}
1530 [main] INFO b.s.StormSubmitter - Finished submitting topology: test_0
status topologies totalSlots slotsUsed totalExecutors
executorsWithMetrics time time-diff ms transferred throughput
(MB/s) total Failed
WAITING 1 12 0 0 0 1445955023986 0 0
0.0 0
WAITING 1 12 3 13 10 1445955053986 30000
69560 0.022112528483072918 0
WAITING 1 12 3 13 10 1445955083986 30000
1012280 0.32179514567057294 0
WAITING 1 12 3 13 11 1445955113986 30000
1346940 0.4281806945800781 0
WAITING 1 12 3 13 11 1445955143986 30000
1419380 0.45120875040690106 0
WAITING 1 12 3 13 11 1445955173986 30000
1315180 0.41808446248372394 0
WAITING 1 12 3 13 11 1445955203987 30001
1324780 0.4211221828901276 0
WAITING 1 12 3 13 11 1445955233986 29999
1377400 0.43787826374811456 0
WAITING 1 12 3 13 11 1445955263986 30000
1432580 0.45540491739908856 0
WAITING 1 12 3 13 11 1445955293987 30001
789260 0.25089063396780004 0
WAITING 1 12 3 13 11 1445955323987 30000
1307960 0.41578928629557294 0
WAITING 1 12 3 13 12 1445955353986 29999
1420280 0.451509903031924 0
...
WAITING 1 12 3 13 13 1445955833986 30000
1330760 0.42303721110026044 0
RUNNING 1 12 3 13 13 1445955863986 30000
1421300 0.45181910196940106 0
RUNNING 1 12 3 13 13 1445955893986 30000
1403780 0.44624964396158856 0
RUNNING 1 12 3 13 13 1445955923986 30000
1325140 0.4212506612141927 0
RUNNING 1 12 3 13 13 1445955953986 30000
1328120 0.42219797770182294 0
RUNNING 1 12 3 13 13 1445955983986 30000
1338720 0.425567626953125 0
RUNNING 1 12 3 13 13 1445956013986 30000
1396000 0.4437764485677083 0
RUNNING 1 12 3 13 13 1445956043986 30000
1400840 0.44531504313151044 0
RUNNING 1 12 3 13 13 1445956073987 30001
1417260 0.4505198032298663 0
RUNNING 1 12 3 13 13 1445956103986 29999
1238620 0.393759819256345 0
RUNNING 1 12 3 13 13 1445956133986 30000
1400860 0.44532140096028644 0
RUNNING 1 12 3 13 13 1445956163986 30000
1429200 0.4543304443359375 0
RUNNING 1 12 3 13 13 1445956193986 30000
1339820 0.4259173075358073 0
RUNNING 1 12 3 13 13 1445956223986 30000
1432580 0.45540491739908856 0
RUNNING 1 12 3 13 13 1445956253986 30000
1373020 0.43647130330403644 0
RUNNING 1 12 3 13 13 1445956283986 30000
1380660 0.4388999938964844 0
RUNNING 1 12 3 13 13 1445956313986 30000
1489820 0.4736010233561198 0
RUNNING 1 12 3 13 13 1445956343986 30000
1414680 0.44971466064453125 0
RUNNING 1 12 3 13 13 1445956373987 30001
1403300 0.44608218666474136 0
RUNNING 1 12 3 13 13 1445956403986 29999
1357460 0.4315392971595147 0
RUNNING 1 12 3 13 13 1445956433986 30000
1418080 0.4507954915364583 0
RUNNING 1 12 3 13 13 1445956463986 30000
1361900 0.4329363505045573 0
RUNNING 1 12 3 13 13 1445956493987 30001
1376360 0.43751847676041006 0
RUNNING 1 12 3 13 13 1445956523987 30000
979760 0.3114573160807292 0
RUNNING 1 12 3 13 13 1445956553986 29999
1476040 0.4692361205334449 0
RUNNING 1 12 3 13 13 1445956583986 30000
1394800 0.4433949788411458 0
RUNNING 1 12 3 13 13 1445956613986 30000
1454180 0.46227137247721356 0
RUNNING 1 12 3 13 13 1445956643986 30000
1420840 0.45167287190755206 0
RUNNING 1 12 3 13 13 1445956673986 30000
1478660 0.47005335489908856 0
RUNNING 1 12 3 13 13 1445956703987 30001
1402460 0.4458151667568112 0
RUNNING 1 12 3 13 13 1445956733986 29999
1386660 0.44082203659718344 0
1741535 [main] INFO c.y.s.p.Main - KILLING test_0
```
It takes a very long time for the event logger bolt to send enough metrics
that the code registers that it is up and starts the test, so each test usually
lasts about 30 mins. I got no failures anywhere during the testing.
The following numbers were all collected on the
nimbus/ui/zk/supervisor/logviewer node. I also launched the topology from that
node, so it would probably be the most overloaded. All the other nodes were
running a supervisor/logviewer/zk
```
top - 14:29:54 up 57 min, 4 users, load average: 5.21, 4.50, 3.23
Tasks: 132 total, 2 running, 130 sleeping, 0 stopped, 0 zombie
Cpu(s): 48.0%us, 42.2%sy, 1.7%ni, 0.2%id, 0.5%wa, 0.0%hi, 7.3%si,
0.2%st
Mem: 4055232k total, 3161188k used, 894044k free, 123084k buffers
```
This is the output of running zktop.py against the zk ensemble (IP
addresses and Host names changed).
```
Ensemble -- nodecount:32 zxid:0x10000086b sessions:16
ID SERVER PORT M OUTST RECVD SENT CONNS MINLAT AVGLAT
MAXLAT
0 NIMBUS 2181 F 0 3895 3901 5 0 1 33
1 SUPER1 2181 L 0 2054 2054 3 0 1 22
2 SUPER2 2181 F 0 11358 11376 8 0 0 29
CLIENT PORT S I QUEUED RECVD SENT
NIMBUS 54133 0 1 0 330 330
NIMBUS 54629 0 1 0 1313 1313
NIMBUS 55899 0 0 0 1 0
SUPER1 41270 0 1 0 1305 1305
NIMBUS 54130 0 1 0 511 511
NIMBUS 49674 1 1 0 357 357
NIMBUS 51440 1 0 0 1 0
SUPER2 60290 1 1 0 1305 1305
SUPER2 51100 2 1 0 1147 1151
NIMBUS 53264 2 1 0 1148 1152
NIMBUS 53088 2 1 0 7937 7940
NIMBUS 54851 2 0 0 1 0
SUPER1 60972 2 1 0 216 216
NUMBUS 53263 2 1 0 217 217
SUPER1 60973 2 1 0 1148 1152
SUPER2 51098 2 1 0 216 216
```
Everything there looks OK.
Disk utilization while the test is running is between 0.6% and 0.7%.
Network maxed out at 23054.9 kbps in and 21443.2 kbps out. Gigabit should be
able to handle theoretically 128 MB/sec in and out so the 22/23 MB/sec should
totally be within that range. If somehow the connection slipped to 100 Mbit
then this would possibly explain what you are seeing, but it is just a guess.
I have screen shots of the UI too, but I don't think it will add much. I
am now going to try and turn on the logging metrics consumer and see if I can
get a measure for the GC/etc as another possible issue.
storm.yaml
```
worker.childopts: "-Dfile.encoding=UTF-8 -Xmx768m
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/tmp/heapdump.hprof
-Dcom.sun.management.jmxremote=true -Dcom.sun.management.jmxremote.port=1%ID%
-Dcom.sun.management.jmxremote.ssl=false
-Dcom.sun.management.jmxremote.authenticate=false"
topology.backpressure.enable: false
topology.transfer.buffer.size: 32
topology.executor.send.buffer.size: 16384
topology.executor.receive.buffer.size: 16384
storm.zookeeper.servers:
- "NIMBUS"
- "SUPER1"
- "SUPER2"
nimbus.host: "NIMBUS"
```
command line I ran
```
./apache-storm-0.11.0-SNAPSHOT/bin/storm jar
./storm_perf_test-1.0.0-SNAPSHOT-jar-with-dependencies.jar
com.yahoo.storm.perftest.Main --ack --ackers 3 --bolt 4 --name test -l 1 -n 1
--workers 3 --spout 3 --testTimeSec 900 -c topology.max.spout.pending=1092
--messageSize 10 | tee run.txt
```
> Update disruptor to latest version (3.2.1)
> ------------------------------------------
>
> Key: STORM-350
> URL: https://issues.apache.org/jira/browse/STORM-350
> Project: Apache Storm
> Issue Type: Dependency upgrade
> Components: storm-core
> Reporter: Boris Aksenov
> Assignee: Boris Aksenov
> Priority: Minor
> Fix For: 0.10.0
>
> Attachments:
> 20141117-0.9.3-rc1-3-worker-separate-1-spout-and-2-bolts-failing-tuples.png,
> 20141117-0.9.3-rc1-one-worker-failing-tuples.png,
> 20141117-0.9.3-rc1-three-workers-1-spout-3-bolts-failing-tuples.png,
> 20141118-0.9.3-branch-3-worker-separate-1-spout-and-2-bolts-ok.png,
> 20141118-0.9.3-branch-one-worker-ok.png,
> 20141118-0.9.3-branch-three-workers-1-spout-3-bolts-ok.png, Storm UI1.pdf,
> Storm UI2.pdf, storm-0.9.3-rc1-failing-tuples.png,
> storm-0_9_2-incubating-failing-tuples.png,
> storm-0_9_2-incubating-no-failing-tuples.png,
> storm-failed-tuples-multi-node.png, storm-multi-node-without-350.png
>
>
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)