Hi Austin,

‘Waiting’ is an expected state of poll_etcd_cluster.
We use monit (https://mmonit.com/monit/) to track our processes; and it checks 
around every 10 seconds for process liveness and responsiveness. We don’t want 
the poll_etcd_cluster script running that often though, so it’s set to only be 
checked on every 6th run through by monit. On the 5/6 times that monit doesn’t 
run the poll_etcd_cluster script it just sets the status as ‘Waiting’.

Can you please send me the debug Sprout logs from the tests that generate 503s?

Ellie

From: Clearwater [mailto:[email protected]] On 
Behalf Of Austin Marston
Sent: 16 November 2015 16:56
To: [email protected]
Subject: [Clearwater] poll_etcd_cluster waiting on manual install

Hi all,

I have a manual install of Clearwater without redundancy that I just deployed 
today from the last Clearwater release, and I cannot get my cluster to work.

All my nodes seem to have their processes "running" or "status ok" however the 
'poll_etcd_cluster' on each node keeps being "Waiting".

Although my cluster seems healthy:
[bono-sipp-sprout]user@cw-012:/var/log/sprout$ clearwater-etcdctl cluster-health
cluster is healthy
member 1d362398497f4d32 is healthy
member 5beec250e4e21a93 is healthy
member 66082b976e9208ce is healthy
member bd214c262ac666e3 is healthy
member d05d0bbb1534c7ee is healthy
member fd62bae1dd31c0f8 is healthy

there is still a bug I cannot catch and it does not seem to be configuration 
related as about 50% of the live tests passed and the other usually get 503 
response.


There are some Sprout etcd logs:
2015/11/16 17:43:53 rafthttp: failed to dial fd62bae1dd31c0f8 on stream MsgApp 
v2 (dial tcp 172.16.1.13:2380<http://172.16.1.13:2380>: i/o timeout)
2015/11/16 17:43:53 rafthttp: failed to dial fd62bae1dd31c0f8 on stream Message 
(dial tcp 172.16.1.13:2380<http://172.16.1.13:2380>: i/o timeout)
2015/11/16 17:43:53 etcdhttp: [GET] 
/v2/keys/clearwater/site1/sprout/clustering/chronos?waitIndex=88&recursive=false&wait=true
 remote:172.16.1.12:39329<http://172.16.1.12:39329>
2015/11/16 17:43:53 etcdhttp: [GET] 
/v2/keys/clearwater/site1/sprout/clustering/memcached?waitIndex=88&recursive=false&wait=true
 remote:172.16.1.12:39330<http://172.16.1.12:39330>
[bono-sipp-sprout]user@cw-012:/var/log/sprout$ tail 
/var/log/clearwater-etcd/clearwater-etcd.log
2015/11/16 17:47:06 rafthttp: failed to dial fd62bae1dd31c0f8 on stream MsgApp 
v2 (dial tcp 172.16.1.13:2380<http://172.16.1.13:2380>: i/o timeout)
2015/11/16 17:47:06 rafthttp: failed to dial fd62bae1dd31c0f8 on stream Message 
(dial tcp 172.16.1.13:2380<http://172.16.1.13:2380>: i/o timeout)
2015/11/16 17:47:06 etcdhttp: [GET] 
/v2/keys/clearwater/site1/configuration/shared_config?quorum=true 
remote:172.16.1.12:32817<http://172.16.1.12:32817>
2015/11/16 17:47:06 etcdhttp: [GET] 
/v2/keys/clearwater/site1/configuration/scscf_json?quorum=true 
remote:172.16.1.12:32816<http://172.16.1.12:32816>
2015/11/16 17:47:06 etcdhttp: [GET] 
/v2/keys/clearwater/site1/configuration/enum_json?quorum=true 
remote:172.16.1.12:32814<http://172.16.1.12:32814>
2015/11/16 17:47:06 etcdhttp: [GET] 
/v2/keys/clearwater/site1/configuration/bgcf_json?quorum=true 
remote:172.16.1.12:32815<http://172.16.1.12:32815>
2015/11/16 17:47:07 etcdhttp: [GET] 
/v2/keys/clearwater/site1/sprout/clustering/chronos?waitIndex=88&recursive=false&wait=true
 remote:172.16.1.12:45031<http://172.16.1.12:45031>
2015/11/16 17:47:07 etcdhttp: [GET] 
/v2/keys/clearwater/site1/sprout/clustering/memcached?waitIndex=88&recursive=false&wait=true
 remote:172.16.1.12:45034<http://172.16.1.12:45034>
2015/11/16 17:47:07 rafthttp: failed to dial fd62bae1dd31c0f8 on stream MsgApp 
v2 (dial tcp 172.16.1.13:2380<http://172.16.1.13:2380>: i/o timeout)
2015/11/16 17:47:07 rafthttp: failed to dial fd62bae1dd31c0f8 on stream Message 
(dial tcp 172.16.1.13:2380<http://172.16.1.13:2380>: i/o timeout)

Thank you in advance for any help!

Austin
_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org

Reply via email to