Hi,

We recently built a Clearwater deployment with one Bono node, two Sprout
nodes, one Homestead node, one Homer node and one Ralf node. Howerver, we
ran into some problems related to Homestead start failure and Sprout
clustering.

*Sprout clustering:*
The manual installation instruction shows for the latest version Sprout
clustering is done by Chronos. To add or remove a Sprout node,
/etc/chronos/chronos.conf needs to modified correspondingly.
However, we found that when we don't have chronos.conf file, the two Sprout
nodes seems working fine by adding IPs of the two Sprout nodes to
/etc/clearwater/cluster_settings.

[sprout]cw@sprout-2:~$ cat /etc/clearwater/cluster_settings
servers=192.168.1.21:11211
servers=192.168.1.22:11211

But, if we do add /etc/chronos/chronos.conf with the information of two
Sprout nodes as below, Chronos failed and no new log files found under
/var/log/chronos.

[sprout]cw@sprout-1:/var/log/chronos$ cat /etc/chronos/chronos.conf
[http]
bind-address = 0.0.0.0
bind-port = 7253

[logging]
folder = /var/log/chronos
level = 5

[cluster]
localhost = 192.168.1.21
node = localhost

sprout-2 = 192.168.1.22
node = sprout-2

[alarms]
enabled = true


[sprout]cw@sprout-1:~$ sudo monit status
The Monit daemon 5.8.1 uptime: 0m

Program 'poll_sprout'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Wed, 04 Feb 2015 11:20:36
  last exit value                   0
  data collected                    Wed, 04 Feb 2015 11:20:36

Process 'sprout'
  status                            Running
  monitoring status                 Monitored
  pid                               1157
  parent pid                        1
  uid                               999
  effective uid                     999
  gid                               999
  uptime                            1m
  children                          0
  memory kilobytes                  42412
  memory kilobytes total            42412
  memory percent                    1.0%
  memory percent total              1.0%
  cpu percent                       0.4%
  cpu percent total                 0.4%
  data collected                    Wed, 04 Feb 2015 11:20:36

Program 'poll_memcached'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Wed, 04 Feb 2015 11:20:36
  last exit value                   0
  data collected                    Wed, 04 Feb 2015 11:20:36

Process 'memcached'
  status                            Running
  monitoring status                 Monitored
  pid                               1092
  parent pid                        1
  uid                               108
  effective uid                     108
  gid                               114
  uptime                            1m
  children                          0
  memory kilobytes                  1180
  memory kilobytes total            1180
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Wed, 04 Feb 2015 11:20:36

Process 'clearwater_diags_monitor'
  status                            Running
  monitoring status                 Monitored
  pid                               1072
  parent pid                        1
  uid                               0
  effective uid                     0
  gid                               0
  uptime                            1m
  children                          1
  memory kilobytes                  1796
  memory kilobytes total            2172
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Wed, 04 Feb 2015 11:20:36

Process 'chronos'
  status                            Execution failed
  monitoring status                 Monitored
  data collected                    Wed, 04 Feb 2015 11:20:26

System 'sprout-1'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.20] [0.09] [0.04]
  cpu                               6.8%us 1.1%sy 0.0%wa
  memory usage                      116944 kB [2.8%]
  swap usage                        0 kB [0.0%]
  data collected                    Wed, 04 Feb 2015 11:20:26


Is it because we are not using Chronos in the right way or there are other
settings we need to do?


*Homestead Failure:*

When we use SIPp to perform user registration tests, we receive “403
Forbidden" response and we observed error on both sprout nodes.

[sprout]cw@sprout-1:~$ cat /var/log/sprout/sprout_current.txt
04-02-2015 18:54:50.884 UTC Warning acr.cpp:627: Failed to send Ralf ACR
message (0x7fce241cd780), rc = 400
04-02-2015 18:54:51.083 UTC Error httpconnection.cpp:573:
http://hs.hp-clearwater.com:8888/impi/6500000008%40hp-clearwater.com/av?impu=sip%3A6500000008%40hp-clearwater.com
failed at server 192.168.1.31 : Timeout was reached (28) : fatal
04-02-2015 18:54:51.083 UTC Error httpconnection.cpp:688: cURL failure with
cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500
04-02-2015 18:54:51.083 UTC Error hssconnection.cpp:145: Failed to get
Authentication Vector for [email protected]
04-02-2015 18:54:51.086 UTC Error httpconnection.cpp:688: cURL failure with
cURL error code 0 (see man 3 libcurl-errors) and HTTP error code 400
04-02-2015 18:54:51.086 UTC Warning acr.cpp:627: Failed to send Ralf ACR
message (0x14322c0), rc = 400
04-02-2015 18:54:51.282 UTC Error httpconnection.cpp:573:
http://hs.hp-clearwater.com:8888/impi/6500000009%40hp-clearwater.com/av?impu=sip%3A6500000009%40hp-clearwater.com
failed at server 192.168.1.31 : Timeout was reached (28) : fatal
04-02-2015 18:54:51.283 UTC Error httpconnection.cpp:688: cURL failure with
cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500
04-02-2015 18:54:51.283 UTC Error hssconnection.cpp:145: Failed to get
Authentication Vector for [email protected]
04-02-2015 18:54:51.286 UTC Error httpconnection.cpp:688: cURL failure with
cURL error code 0 (see man 3 libcurl-errors) and HTTP error code 400
04-02-2015 18:54:51.286 UTC Warning acr.cpp:627: Failed to send Ralf ACR
message (0x7fce1c1fdef0), rc = 400
....


It seems like Homestead is unreachable.
Then on Homestead node, if we check status using monit:

[homestead]cw@homestead-1:~$ sudo monit status
The Monit daemon 5.8.1 uptime: 15m

Process 'nginx'
  status                            Running
  monitoring status                 Monitored
  pid                               1044
  parent pid                        1
  uid                               0
  effective uid                     0
  gid                               0
  uptime                            15m
  children                          4
  memory kilobytes                  1240
  memory kilobytes total            8448
  memory percent                    0.0%
  memory percent total              0.2%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  port response time                0.000s to 127.0.0.1:80/ping [HTTP via
TCP]
  data collected                    Wed, 04 Feb 2015 10:58:02

Program 'poll_homestead'
  status                            Status failed
  monitoring status                 Monitored
  last started                      Wed, 04 Feb 2015 10:58:02
  last exit value                   1
  data collected                    Wed, 04 Feb 2015 10:58:02

Process 'homestead'
  status                            Does not exist
  monitoring status                 Monitored
  data collected                    Wed, 04 Feb 2015 10:58:02

Program 'poll_homestead-prov'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Wed, 04 Feb 2015 10:58:02
  last exit value                   0
  data collected                    Wed, 04 Feb 2015 10:58:02

Process 'homestead-prov'
  status                            Execution failed
  monitoring status                 Monitored
  data collected                    Wed, 04 Feb 2015 10:58:32

Process 'clearwater_diags_monitor'
  status                            Running
  monitoring status                 Monitored
  pid                               1027
  parent pid                        1
  uid                               0
  effective uid                     0
  gid                               0
  uptime                            16m
  children                          1
  memory kilobytes                  1664
  memory kilobytes total            2040
  memory percent                    0.0%
  memory percent total              0.0%
  cpu percent                       0.0%
  cpu percent total                 0.0%
  data collected                    Wed, 04 Feb 2015 10:58:32

Program 'poll_cassandra_ring'
  status                            Status ok
  monitoring status                 Monitored
  last started                      Wed, 04 Feb 2015 10:58:32
  last exit value                   0
  data collected                    Wed, 04 Feb 2015 10:58:32

Process 'cassandra'
  status                            Running
  monitoring status                 Monitored
  pid                               1280
  parent pid                        1277
  uid                               106
  effective uid                     106
  gid                               113
  uptime                            16m
  children                          0
  memory kilobytes                  1388648
  memory kilobytes total            1388648
  memory percent                    34.3%
  memory percent total              34.3%
  cpu percent                       0.4%
  cpu percent total                 0.4%
  data collected                    Wed, 04 Feb 2015 10:58:32

System 'homestead-1'
  status                            Running
  monitoring status                 Monitored
  load average                      [0.00] [0.04] [0.05]
  cpu                               3.0%us 0.8%sy 0.0%wa
  memory usage                      1505324 kB [37.1%]
  swap usage                        0 kB [0.0%]
  data collected                    Wed, 04 Feb 2015 10:58:32


And log file shows:

[homestead]cw@homestead-1:~$ cat
/var/log/homestead-prov/homestead-prov-err.log
Traceback (most recent call last):
  File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File
"/usr/share/clearwater/homestead/env/lib/python2.7/site-packages/crest-0.1-py2.7.egg/metaswitch/crest/main.py",
line 156, in <module>
    standalone()
  File
"/usr/share/clearwater/homestead/env/lib/python2.7/site-packages/crest-0.1-py2.7.egg/metaswitch/crest/main.py",
line 119, in standalone
    reactor.listenUNIX(unix_sock_name, application)
  File
"/usr/share/clearwater/homestead/env/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg/twisted/internet/posixbase.py",
line 413, in listenUNIX
    p.startListening()
  File
"/usr/share/clearwater/homestead/env/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg/twisted/internet/unix.py",
line 293, in startListening
    raise CannotListenError, (None, self.port, le)
twisted.internet.error.CannotListenError: Couldn't listen on
any:/tmp/.homestead-prov-sock-0: [Errno 98] Address already in use.
......

[homestead]cw@homestead-1:~$ cat
/var/log/homestead-prov/homestead-prov-0.log
2015-02-04 18:42:23,476 UTC INFO main:118 Going to listen for HTTP on UNIX
socket /tmp/.homestead-prov-sock-0
2015-02-04 18:42:24,087 UTC INFO main:118 Going to listen for HTTP on UNIX
socket /tmp/.homestead-prov-sock-0
2015-02-04 18:42:35,826 UTC INFO main:118 Going to listen for HTTP on UNIX
socket /tmp/.homestead-prov-sock-0
2015-02-04 18:43:16,205 UTC INFO main:118 Going to listen for HTTP on UNIX
socket /tmp/.homestead-prov-sock-0
......

homestead_20150204T180000Z.txt  homestead_current.txt
[homestead]cw@homestead-1:~$ cat /var/log/homestead/homestead_current.txt
04-02-2015 18:42:19.586 UTC Status main.cpp:468: Log level set to 2
04-02-2015 18:42:19.602 UTC Status main.cpp:489: Access logging enabled to
/var/log/homestead
04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:93: Constructing
LoadMonitor
04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:94:    Target latency
(usecs)   : 100000
04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:95:    Max bucket size
         : 20
04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:96:    Initial token
fill rate/s: 10.000000
04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:97:    Min token fill
rate/s    : 10.000000
04-02-2015 18:42:19.614 UTC Status dnscachedresolver.cpp:90: Creating
Cached Resolver using server 127.0.0.1
04-02-2015 18:42:19.614 UTC Status httpresolver.cpp:50: Created HTTP
resolver
04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:145: Configuring
store
04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:146:   Hostname:
 localhost
04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:147:   Port:
 9160
04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:148:   Threads:   10
04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:149:   Max Queue: 0
04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:199: Starting store
04-02-2015 18:42:19.616 UTC Error cassandra_store.cpp:207: Cache caught
TTransportException: connect() failed: Connection refused
04-02-2015 18:42:19.616 UTC Error main.cpp:550: Failed to initialize cache
- rc 3
04-02-2015 18:42:19.616 UTC Status cassandra_store.cpp:185: Stopping cache
04-02-2015 18:42:19.616 UTC Status cassandra_store.cpp:226: Waiting for
cache to stop
......

And the port usage is:

[homestead]cw@homestead-1:~$ sudo netstat -tulpn
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State
    PID/Program name
tcp        0      0 127.0.0.1:9042          0.0.0.0:*               LISTEN
     1280/jsvc.exec
tcp        0      0 0.0.0.0:53              0.0.0.0:*               LISTEN
     952/dnsmasq
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN
     827/sshd
tcp        0      0 127.0.0.1:7000          0.0.0.0:*               LISTEN
     1280/jsvc.exec
tcp        0      0 127.0.0.1:2812          0.0.0.0:*               LISTEN
     1036/monit
tcp        0      0 0.0.0.0:37791           0.0.0.0:*               LISTEN
     1280/jsvc.exec
tcp        0      0 0.0.0.0:7199            0.0.0.0:*               LISTEN
     1280/jsvc.exec
tcp        0      0 0.0.0.0:53313           0.0.0.0:*               LISTEN
     1280/jsvc.exec
tcp        0      0 127.0.0.1:9160          0.0.0.0:*               LISTEN
     1280/jsvc.exec
tcp6       0      0 :::53                   :::*                    LISTEN
     952/dnsmasq
tcp6       0      0 :::22                   :::*                    LISTEN
     827/sshd
tcp6       0      0 :::8889                 :::*                    LISTEN
     1044/nginx
tcp6       0      0 :::80                   :::*                    LISTEN
     1044/nginx
udp        0      0 0.0.0.0:13344           0.0.0.0:*
    952/dnsmasq
udp        0      0 0.0.0.0:48567           0.0.0.0:*
    952/dnsmasq
udp        0      0 0.0.0.0:53              0.0.0.0:*
    952/dnsmasq
udp        0      0 0.0.0.0:41016           0.0.0.0:*
    952/dnsmasq
udp        0      0 0.0.0.0:68              0.0.0.0:*
    634/dhclient3
udp        0      0 192.168.1.31:123        0.0.0.0:*
    791/ntpd
udp        0      0 127.0.0.1:123           0.0.0.0:*
    791/ntpd
udp        0      0 0.0.0.0:123             0.0.0.0:*
    791/ntpd
udp6       0      0 :::53                   :::*
     952/dnsmasq
udp6       0      0 fe80::f816:3eff:fe7:123 :::*
     791/ntpd
udp6       0      0 ::1:123                 :::*
     791/ntpd
udp6       0      0 :::123                  :::*
     791/ntpd



So, how should we fix the problems with Homestead and Homestead-prov?

Best regards,
Lianjie
_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/listinfo/clearwater

Reply via email to