Hi Sarbajit, That is very strange – netstat thinks there is a process listening on port 5060 on the host, but netcat doesn’t. I’m not really sure what further to suggest. I’ve got a couple of ideas for things you could do to try and narrow down the problem, but I’m afraid there’s nothing particularly concrete.
You could try stopping the Bono container and running netcat as both a server and a client on the host to check that the host hasn’t got some weird iptables rule blocking the connection. It might also be worth taking packet captures to try and work out exactly what’s going on. Also, have you tried just deploying on a single host using Docker (i.e. not using Docker Swarm)? It would be good to verify that that works. I hope you manage to get to the bottom of the problem! Thanks, Graeme From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org] On Behalf Of Sarbajit Chatterjee Sent: 22 September 2016 12:14 To: clearwater@lists.projectclearwater.org Subject: Re: [Project Clearwater] Deploy Clearwater in a Swarm cluster using docker-compose Hi Graeme, I can connect to 5060 port in 10.0.1.2 from sprout container. root@bb818f0a5535:/# nc -v -z 10.0.1.2 5060 Connection to 10.0.1.2 5060 port [tcp/sip] succeeded! root@bb818f0a5535:/# Docker PS on the cluster shows bono container exposing port 5060 CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES bb818f0a5535 swarm-node:5000/clearwaterdocker_sprout "/usr/bin/supervisord" 47 hours ago Up 47 hours 5052/tcp, 5054/tcp, 10.109.190.10:32822->22/tcp swarm-node/clearwaterdocker_sprout_1 9ff210721451 swarm-node:5000/clearwaterdocker_homer "/usr/bin/supervisord" 47 hours ago Up 47 hours 7888/tcp, 10.109.190.9:32788->22/tcp swarm-master/clearwaterdocker_homer_1 51653420c979 swarm-node:5000/clearwaterdocker_homestead "/usr/bin/supervisord" 47 hours ago Up 47 hours 8888-8889/tcp, 10.109.190.9:32787->22/tcp swarm-master/clearwaterdocker_homestead_1 e994b17b4563 swarm-node:5000/clearwaterdocker_ellis "/usr/bin/supervisord" 47 hours ago Up 47 hours 10.109.190.10:80->80/tcp, 10.109.190.10:32821->22/tcp swarm-node/clearwaterdocker_ellis_1 9837c4dab241 swarm-node:5000/clearwaterdocker_bono "/usr/bin/supervisord" 47 hours ago Up 47 hours 10.109.190.9:3478->3478/tcp, 10.109.190.9:3478->3478/udp, 10.109.190.9:5060->5060/tcp, 10.109.190.9:5062->5062/tcp, 10.109.190.9:5060->5060/udp, 5058/tcp, 10.109.190.9:32786->22/tcp swarm-master/clearwaterdocker_bono_1 3db967c58754 swarm-node:5000/clearwaterdocker_ralf "/usr/bin/supervisord" 47 hours ago Up 47 hours 10888/tcp, 10.109.190.10:32820->22/tcp swarm-node/clearwaterdocker_ralf_1 c499d05af8e7 quay.io/coreos/etcd:v2.2.5<http://quay.io/coreos/etcd:v2.2.5> "/etcd -name etcd0 -a" 47 hours ago Up 47 hours 2379-2380/tcp, 4001/tcp, 7001/tcp swarm-node/clearwaterdocker_etcd_1 5fe0c51979e7 ubuntu:14.04.5 "/bin/bash" 8 days ago Up 8 days But I can't reach the 5060 port from the host machine where it is launched. Though the port seems to be open. root@swarm-master:~# nc -v -z 10.109.190.9 5060 nc: connect to 10.109.190.9 port 5060 (tcp) failed: Connection refused root@swarm-master:~# root@swarm-master:~# netstat -anp | grep 5060 tcp6 0 0 :::5060 :::* LISTEN 21818/docker-proxy udp6 0 0 :::5060 :::* 21829/docker-proxy root@swarm-master:~# Any suggestion? -Sarbajit On Thu, Sep 22, 2016 at 4:08 PM, Graeme Robertson (projectclearwater.org<http://projectclearwater.org>) <g...@projectclearwater.org<mailto:g...@projectclearwater.org>> wrote: Hi Sarbajit, That output all looks fine, but it sounds as though the port mapping has failed – i.e. port 5060 on the Bono container hasn’t been exposed as port 5060 on the host. I’m not sure why that would have failed. Can you run nc -z -v 10.0.1.2 5060 inside the Bono container (this should work, but it’s worth doing as a sanity check!). Then can you run docker ps in your clearwater-docker checkout? The output should include a line that looks something like 0b4058027844 clearwaterdocker_bono "/usr/bin/supervisord" About a minute ago Up About a minute 0.0.0.0:3478->3478/tcp, 0.0.0.0:3478->3478/udp, 0.0.0.0:5060->5060/tcp, 0.0.0.0:5062->5062/tcp, 0.0.0.0:5060->5060/udp, 5058/tcp, 0.0.0.0:42513->22/tcp clearwaterdocker_bono_1, which should indicate that the port mapping is active. If this all looks fine it might be worth also running nc -v -z 10.109.190.9 5060 on the host that’s running the Bono container. Thanks, Graeme ________________________________ From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org<mailto:clearwater-boun...@lists.projectclearwater.org>] On Behalf Of Sarbajit Chatterjee Sent: 21 September 2016 19:39 To: clearwater@lists.projectclearwater.org<mailto:clearwater@lists.projectclearwater.org> Subject: Re: [Project Clearwater] Deploy Clearwater in a Swarm cluster using docker-compose Hi Graeme, I'm using following command to run the livetest - rake test[example.com<http://example.com>] TESTS="Basic*" SIGNUP_CODE=secret PROXY=10.109.190.9 ELLIS=10.109.190.10 here PROXY ip is where the bono container is launched and ELLIS ip is where the ellis container is launched. The bono service seems to be running in the container - root@9837c4dab241:/# ps -eaf | grep bono root 122 1 0 Sep19 ? 00:00:10 /usr/share/clearwater/clearwater-cluster-manager/env/bin/python /usr/share/clearwater/bin/clearwater-cluster-manager --mgmt-local-ip=10.0.1.2 --sig-local-ip=10.0.1.2 --local-site=site1 --remote-site= --remote-cassandra-seeds= --signaling-namespace= --uuid=18c7daf3-a098-47ae-962f-a3d57c0cff6f --etcd-key=clearwater --etcd-cluster-key=bono --log-level=3 --log-directory=/var/log/clearwater-cluster-manager --pidfile=/var/run/clearwater-cluster-manager.pid root 124 1 0 Sep19 ? 00:00:00 /bin/bash /etc/init.d/bono run root 139 124 0 Sep19 ? 00:00:00 /bin/bash /usr/share/clearwater/bin/run-in-signaling-namespace start-stop-daemon --start --quiet --exec /usr/share/clearwater/bin/bono --chuid bono --chdir /etc/clearwater -- --domain=example.com<http://example.com> --localhost=10.0.1.2,10.0.1.2 --alias=10.0.1.2 --pcscf=5060,5058 --webrtc-port=5062 --routing-proxy=scscf.sprout,5052,50,600 --ralf=ralf:10888 --sas=0.0.0.0,bono@10.0.1.2<mailto:bono@10.0.1.2> --dns-server=127.0.0.11 --worker-threads=4 --analytics=/var/log/bono --log-file=/var/log/bono --log-level=2 bono 140 139 0 Sep19 ? 00:11:15 /usr/share/clearwater/bin/bono --domain=example.com<http://example.com> --localhost=10.0.1.2,10.0.1.2 --alias=10.0.1.2 --pcscf=5060,5058 --webrtc-port=5062 --routing-proxy=scscf.sprout,5052,50,600 --ralf=ralf:10888 --sas=0.0.0.0,bono@10.0.1.2<mailto:bono@10.0.1.2> --dns-server=127.0.0.11 --worker-threads=4 --analytics=/var/log/bono --log-file=/var/log/bono --log-level=2 root 322 293 0 17:48 ? 00:00:00 grep --color=auto bono root@9837c4dab241:/# root@9837c4dab241:/# netstat -planut | grep 5060 tcp 0 0 10.0.1.2:5060<http://10.0.1.2:5060> 0.0.0.0:* LISTEN - udp 0 0 10.0.1.2:5060<http://10.0.1.2:5060> 0.0.0.0:* - root@9837c4dab241:/# But the connection to bono is failing from livetest container as you had predicted. root@40efba73deb5:~/clearwater-live-test# nc -v -z 10.109.190.9 5060 nc: connect to 10.109.190.9 port 5060 (tcp) failed: Connection refused root@40efba73deb5:~/clearwater-live-test# On checking the bono log, I see a series of errors like below in beginning of the log - 19-09-2016 15:41:44.612 UTC Status utils.cpp:591: Log level set to 2 19-09-2016 15:41:44.612 UTC Status main.cpp:1388: Access logging enabled to /var/log/bono 19-09-2016 15:41:44.613 UTC Warning main.cpp:1435: SAS server option was invalid or not configured - SAS is disabled 19-09-2016 15:41:44.613 UTC Warning main.cpp:1511: A registration expiry period should not be specified for P-CSCF 19-09-2016 15:41:44.613 UTC Status snmp_agent.cpp:117: AgentX agent initialised 19-09-2016 15:41:44.613 UTC Status load_monitor.cpp:105: Constructing LoadMonitor 19-09-2016 15:41:44.613 UTC Status load_monitor.cpp:106: Target latency (usecs) : 100000 19-09-2016 15:41:44.613 UTC Status load_monitor.cpp:107: Max bucket size : 1000 19-09-2016 15:41:44.613 UTC Status load_monitor.cpp:108: Initial token fill rate/s: 100.000000 19-09-2016 15:41:44.613 UTC Status load_monitor.cpp:109: Min token fill rate/s : 10.000000 19-09-2016 15:41:44.613 UTC Status dnscachedresolver.cpp:144: Creating Cached Resolver using servers: 19-09-2016 15:41:44.613 UTC Status dnscachedresolver.cpp:154: 127.0.0.11 19-09-2016 15:41:44.613 UTC Status sipresolver.cpp:60: Created SIP resolver 19-09-2016 15:41:44.637 UTC Status stack.cpp:419: Listening on port 5058 19-09-2016 15:41:44.637 UTC Status stack.cpp:419: Listening on port 5060 19-09-2016 15:41:44.638 UTC Status stack.cpp:855: Local host aliases: 19-09-2016 15:41:44.638 UTC Status stack.cpp:862: 10.0.1.2 19-09-2016 15:41:44.638 UTC Status stack.cpp:862: 172.18.0.2 19-09-2016 15:41:44.638 UTC Status stack.cpp:862: 10.0.1.2 19-09-2016 15:41:44.638 UTC Status stack.cpp:862: 10.0.1.2 19-09-2016 15:41:44.638 UTC Status stack.cpp:862: 19-09-2016 15:41:44.639 UTC Status httpresolver.cpp:52: Created HTTP resolver 19-09-2016 15:41:44.641 UTC Status httpconnection.cpp:114: Configuring HTTP Connection 19-09-2016 15:41:44.641 UTC Status httpconnection.cpp:115: Connection created for server ralf:10888 19-09-2016 15:41:44.641 UTC Status httpconnection.cpp:116: Connection will use a response timeout of 500ms 19-09-2016 15:41:44.642 UTC Status connection_pool.cpp:72: Creating connection pool to scscf.sprout:5052 19-09-2016 15:41:44.642 UTC Status connection_pool.cpp:73: connections = 50, recycle time = 600 +/- 120 seconds 19-09-2016 15:41:44.649 UTC Status bono.cpp:3314: Create list of PBXes 19-09-2016 15:41:44.649 UTC Status pluginloader.cpp:63: Loading plug-ins from /usr/share/clearwater/sprout/plugins 19-09-2016 15:41:44.649 UTC Status pluginloader.cpp:158: Finished loading plug-ins 19-09-2016 15:41:44.652 UTC Warning (Net-SNMP): Warning: Failed to connect to the agentx master agent ([NIL]): 19-09-2016 15:41:44.653 UTC Error pjsip: tcpc0x14f3df8 TCP connect() error: Connection refused [code=120111] 19-09-2016 15:41:44.653 UTC Error pjsip: tcpc0x14f5c38 TCP connect() error: Connection refused [code=120111] I have observed that in the bono container 5060 port is not listening in all interfaces while 5062 port is listening in all interfaces. root@9837c4dab241:/var/log/bono# netstat -planut | grep LISTEN tcp 0 0 10.0.1.2:4000<http://10.0.1.2:4000> 0.0.0.0:* LISTEN - tcp 0 0 10.0.1.2:5058<http://10.0.1.2:5058> 0.0.0.0:* LISTEN - tcp 0 0 127.0.0.11:43395<http://127.0.0.11:43395> 0.0.0.0:* LISTEN - tcp 0 0 10.0.1.2:5060<http://10.0.1.2:5060> 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:5062<http://0.0.0.0:5062> 0.0.0.0:* LISTEN - tcp 0 0 127.0.0.1:8080<http://127.0.0.1:8080> 0.0.0.0:* LISTEN - tcp 0 0 10.0.1.2:3478<http://10.0.1.2:3478> 0.0.0.0:* LISTEN - tcp 0 0 0.0.0.0:22<http://0.0.0.0:22> 0.0.0.0:* LISTEN 9/sshd tcp6 0 0 :::22 :::* LISTEN 9/sshd root@9837c4dab241:/var/log/bono# Is it right to have 5060 port listen on only local port? Please help me to debug the issue. Thanks, Sarbajit On Wed, Sep 21, 2016 at 10:01 PM, Graeme Robertson (projectclearwater.org<http://projectclearwater.org>) <gra...@projectclearwater.org<mailto:gra...@projectclearwater.org>> wrote: Hi Sarbajit, I’ve had another look at this, and actually I think clearwater-live-test checks it can connect to Bono before it tries to provision numbers from Ellis, and it’s actually that connection that’s failing – apologies! Can you do similar checks for the Bono container, i.e. connect to the Bono container and run ps -eaf | grep bono and run nc -z -v <ip> 5060 from your live test container (where <ip> is the IP of your Bono)? One other thought – what command are you using to run the tests? You’ll need to set the PROXY option to your Bono IP and the ELLIS option to your Ellis IP. Thanks, Graeme From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org<mailto:clearwater-boun...@lists.projectclearwater.org>] On Behalf Of Sarbajit Chatterjee Sent: 21 September 2016 16:41 To: clearwater@lists.projectclearwater.org<mailto:clearwater@lists.projectclearwater.org> Subject: Re: [Project Clearwater] Deploy Clearwater in a Swarm cluster using docker-compose Thanks Graeme for your reply. Here are the command outputs that you had asked - root@e994b17b4563:/# ps -eaf | grep ellis root 177 1 0 Sep19 ? 00:00:10 /usr/share/clearwater/clearwater-cluster-manager/env/bin/python /usr/share/clearwater/bin/clearwater-cluster-manager --mgmt-local-ip=10.0.1.7 --sig-local-ip=10.0.1.7 --local-site=site1 --remote-site= --remote-cassandra-seeds= --signaling-namespace= --uuid=18c7daf3-a098-47ae-962f-a3d57c0cff6f --etcd-key=clearwater --etcd-cluster-key=ellis --log-level=3 --log-directory=/var/log/clearwater-cluster-manager --pidfile=/var/run/clearwater-cluster-manager.pid root 180 1 0 Sep19 ? 00:00:00 /bin/sh /etc/init.d/ellis run ellis 185 180 0 Sep19 ? 00:00:05 /usr/share/clearwater/ellis/env/bin/python -m metaswitch.ellis.main root 287 253 0 15:18 ? 00:00:00 grep --color=auto ellis root@e994b17b4563:/# root@e994b17b4563:/# ps -eaf | grep nginx root 179 1 0 Sep19 ? 00:00:00 nginx: master process /usr/sbin/nginx -g daemon off; www-data 186 179 0 Sep19 ? 00:00:16 nginx: worker process www-data 187 179 0 Sep19 ? 00:00:00 nginx: worker process www-data 188 179 0 Sep19 ? 00:00:16 nginx: worker process www-data 189 179 0 Sep19 ? 00:00:16 nginx: worker process root 289 253 0 15:19 ? 00:00:00 grep --color=auto nginx root@e994b17b4563:/# root@e994b17b4563:/# netstat -planut | grep nginx tcp6 0 0 :::80 :::* LISTEN 179/nginx -g daemon root@e994b17b4563:/# I think both ellis and nginx are running fine inside the container. I can also open the ellis login page from a web browser. I also checked the MySQL DB in ellis container. I can see livetest user entry in users table and 1000 rows in numbers table. I can also connect to ellis (host IP 10.109.190.10) from my livetest container - root@40efba73deb5:~/clearwater-live-test# nc -v -z 10.109.190.10 80 Connection to 10.109.190.10 80 port [tcp/http] succeeded! root@40efba73deb5:~/clearwater-live-test# Is this happening because Clearwater containers are spread across multiple hosts? What other areas I should check? Thanks, Sarbajit On Wed, Sep 21, 2016 at 6:09 PM, Graeme Robertson (projectclearwater.org<http://projectclearwater.org>) <gra...@projectclearwater.org<mailto:gra...@projectclearwater.org>> wrote: Hi Sarbajit, I don’t think we’ve never tried deploying Project Clearwater in a Docker Swarm cluster, but I don’t see any reason why it couldn’t work. The tests are failing very early – they’re not able to connect to Ellis on port 80. I can think of a couple of reasons for this – either Ellis isn’t running or the Ellis port mapping hasn’t worked for some reason. Can you connect to the Ellis container and run ps –eaf | grep ellis and ps –eaf | grep nginx to confirm that NGINX and Ellis are running? Can you also run sudo netstat -planut | grep nginx or something equivalent to check that NGINX is listening on port 80? If there’s a problem with either NGINX or Ellis we probably need to look in the logs at /var/log/nginx/ or /var/log/ellis/ on the Ellis container. If however that all looks fine, then it sounds like the port mapping has failed for some reason. Can you run nc -z <ip> 80 from the box you’re running the live tests on? This will scan for anything listening at <ip>:80 and will return successfully if it finds anything. Thanks, Graeme ________________________________ From: Clearwater [mailto:clearwater-boun...@lists.projectclearwater.org] On Behalf Of Sarbajit Chatterjee Sent: 20 September 2016 15:05 To: clearwater@lists.projectclearwater.org<mailto:clearwater@lists.projectclearwater.org> Subject: [Project Clearwater] Deploy Clearwater in a Swarm cluster using docker-compose Hello, I am following the instructions from https://github.com/Metaswitch/clearwater-docker. I can successfully deploy it on a single Docker node but, the compose file does not work with Swarm cluster. I did try to modify the compose file like this - version: '2' services: etcd: image: quay.io/coreos/etcd:v2.2.5<http://quay.io/coreos/etcd:v2.2.5> command: > -name etcd0 -advertise-client-urls http://etcd:2379,http://etcd:4001 -listen-client-urls http://0.0.0.0:2379,http://0.0.0.0:4001 -initial-advertise-peer-urls http://etcd:2380 -listen-peer-urls http://0.0.0.0:2380 -initial-cluster etcd0=http://etcd:2380 -initial-cluster-state new bono: image: swarm-node:5000/clearwaterdocker_bono ports: - 22 - "3478:3478" - "3478:3478/udp" - "5060:5060" - "5060:5060/udp" - "5062:5062" sprout: image: swarm-node:5000/clearwaterdocker_sprout networks: default: aliases: - scscf.sprout - icscf.sprout ports: - 22 homestead: image: swarm-node:5000/clearwaterdocker_homestead ports: - 22 homer: image: swarm-node:5000/clearwaterdocker_homer ports: - 22 ralf: image: swarm-node:5000/clearwaterdocker_ralf ports: - 22 ellis: image: swarm-node:5000/clearwaterdocker_ellis ports: - 22 - "80:80" where swarm-node:5000 is the local docker registry and it hosts the pre-built images of Clearwater containers. Even though the deployment succeeded, clearwater-livetests are failing with following error - Basic Registration (TCP) - Failed Errno::ECONNREFUSED thrown: - Connection refused - connect(2) - /usr/local/rvm/gems/ruby-1.9.3-p551/gems/quaff-0.7.3/lib/sources.rb:41:in `initialize' Any suggestions on how I can deploy Clearwater on a Swarm cluster? Thanks, Sarbajit _______________________________________________ Clearwater mailing list Clearwater@lists.projectclearwater.org<mailto:Clearwater@lists.projectclearwater.org> http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org _______________________________________________ Clearwater mailing list Clearwater@lists.projectclearwater.org<mailto:Clearwater@lists.projectclearwater.org> http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org _______________________________________________ Clearwater mailing list Clearwater@lists.projectclearwater.org<mailto:Clearwater@lists.projectclearwater.org> http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
_______________________________________________ Clearwater mailing list Clearwater@lists.projectclearwater.org http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org