Hi all, I deployed manually clearwater infra with one bono,sprout,ellis,ralph,homer, and hs. My sip testing seem to be working fine but my stress testing is not working at all.
I created a new node for sip testing, following https://github.com/Metaswitch/crest/blob/dev/docs/Bulk-Provisioning%20Numbers.md Note: Whenever I want to check what might be wrong I always get different status for my nodes. The clearwater cluster_manager seem to fail most of the time on bono and sprout and when I check the cluster health results are always different. Like for instance, when I ran clearwater-etcdctl cluster-health I see that the cluster is healthy but sometimes my bono node (172-16-1-20) or sprout node (172-16-1-20) are reported as not healthy. [17:07:14][sprout]user@cw-002:/var/log/sprout$ clearwater-etcdctl cluster-health cluster is healthy member 27a940d2104e9692 is unhealthy member 2ea8f3a5eea05584 is healthy member 5fdc25bd4ae527c0 is healthy member d26088cb54745bbc is healthy member e525c6a4ed161686 is healthy member f5765a98a56e9c4a is healthy [17:07:24]user@cw-002:/var/log/sprout$ clearwater-etcdctl member list 27a940d2104e9692: name=172-16-1-20 peerURLs=http://172.16.1.20:2380 clientURLs=http://172.16.1.20:4000 2ea8f3a5eea05584: name=172-16-1-22 peerURLs=http://172.16.1.22:2380 clientURLs=http://172.16.1.22:4000 5fdc25bd4ae527c0: name=172-16-1-25 peerURLs=http://172.16.1.25:2380 clientURLs=http://172.16.1.25:4000 d26088cb54745bbc: name=172-16-1-24 peerURLs=http://172.16.1.24:2380 clientURLs=http://172.16.1.24:4000 e525c6a4ed161686: name=172-16-1-21 peerURLs=http://172.16.1.21:2380 clientURLs=http://172.16.1.21:4000 f5765a98a56e9c4a: name=172-16-1-23 peerURLs=http://172.16.1.23:2380 clientURLs=http://172.16.1.23:4000 [17:10:00][sprout]user@cw-002:/var/log/sprout$ clearwater-etcdctl cluster-health cluster is healthy member 27a940d2104e9692 is healthy member 2ea8f3a5eea05584 is healthy member 5fdc25bd4ae527c0 is healthy member d26088cb54745bbc is healthy member e525c6a4ed161686 is healthy member f5765a98a56e9c4a is healthy I attach my sip stress logs and my sprout logs. I was running the test between 13:57 and 14:05 on the 22 of october. If you have any idea about why this could go wrong. I certainly forgot something that might be obvious but cannot catch it! Thanks, Austin sprout_20151022T140000Z.txt <https://drive.google.com/file/d/0BwD2rKlmArODN2V3NnJMeC1Vcms/view?usp=drive_web> all_sip_log.tgz <https://drive.google.com/file/d/0BwD2rKlmArODRXBsQXpySHhuSGc/view?usp=drive_web>
_______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
