Hi Ellie,
Thanks a lot for pointing out the relation to Cassandra!
I changed the logging level of Homestead and Homestead-prov to 5 and
cleared out all the previous logs and restarted everthing.
Here are the errors reported in /var/log/cassandra/system.log:
INFO [SSTableBatchOpen:1] 2015-02-10 09:24:06,559 SSTableReader.java (line
232) Opening
/var/lib/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-ic-16
(317 bytes)
ERROR [SSTableBatchOpen:1] 2015-02-10 09:24:06,573 CassandraDaemon.java
(line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)
at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:418)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:209)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.io.EOFException
at
java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83)
... 12 more
INFO [SSTableBatchOpen:1] 2015-02-10 09:24:06,666 SSTableReader.java (line
232) Opening
/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-ic-31
(6226 bytes)
ERROR [SSTableBatchOpen:1] 2015-02-10 09:24:06,670 CassandraDaemon.java
(line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)
at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:418)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:209)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.io.EOFException
at
java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83)
... 12 more
INFO [SSTableBatchOpen:1] 2015-02-10 09:24:06,791 SSTableReader.java (line
232) Opening
/var/lib/cassandra/data/system/schema_columns/system-schema_columns-ic-31
(3305 bytes)
ERROR [SSTableBatchOpen:1] 2015-02-10 09:24:06,794 CassandraDaemon.java
(line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)
at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:418)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:209)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.io.EOFException
at
java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83)
... 12 more
INFO [SSTableBatchOpen:1] 2015-02-10 09:24:06,885 SSTableReader.java (line
232) Opening /var/lib/cassandra/data/system/local/system-local-ic-2 (120
bytes)
ERROR [SSTableBatchOpen:1] 2015-02-10 09:24:06,887 CassandraDaemon.java
(line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)
at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:418)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:209)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.io.EOFException
at
java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83)
... 12 more
INFO [SSTableBatchOpen:1] 2015-02-10 09:24:06,889 SSTableReader.java (line
232) Opening /var/lib/cassandra/data/system/local/system-local-ic-1 (357
bytes)
ERROR [SSTableBatchOpen:1] 2015-02-10 09:24:06,891 CassandraDaemon.java
(line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)
at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:418)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:209)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.io.EOFException
at
java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83)
... 12 more
INFO [SSTableBatchOpen:1] 2015-02-10 09:24:06,894 SSTableReader.java (line
232) Opening /var/lib/cassandra/data/system/local/system-local-ic-3 (109
bytes)
ERROR [SSTableBatchOpen:1] 2015-02-10 09:24:06,896 CassandraDaemon.java
(line 191) Exception in thread Thread[SSTableBatchOpen:1,5,main]
org.apache.cassandra.io.sstable.CorruptSSTableException:
java.io.EOFException
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:108)
at
org.apache.cassandra.io.compress.CompressionMetadata.create(CompressionMetadata.java:63)
at
org.apache.cassandra.io.util.CompressedPoolingSegmentedFile$Builder.complete(CompressedPoolingSegmentedFile.java:42)
at
org.apache.cassandra.io.sstable.SSTableReader.load(SSTableReader.java:418)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:209)
at
org.apache.cassandra.io.sstable.SSTableReader.open(SSTableReader.java:157)
at
org.apache.cassandra.io.sstable.SSTableReader$1.run(SSTableReader.java:273)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at
java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:701)
Caused by: java.io.EOFException
at
java.io.DataInputStream.readUnsignedShort(DataInputStream.java:340)
at java.io.DataInputStream.readUTF(DataInputStream.java:589)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at
org.apache.cassandra.io.compress.CompressionMetadata.<init>(CompressionMetadata.java:83)
... 12 more
Cassandra reported errors when opening those SSTable files:
/var/lib/cassandra/data/system/schema_keyspaces/system-schema_keyspaces-ic-16
/var/lib/cassandra/data/system/schema_columnfamilies/system-schema_columnfamilies-ic-31
/var/lib/cassandra/data/system/schema_columns/system-schema_columns-ic-31
/var/lib/cassandra/data/system/local/system-local-ic-2
/var/lib/cassandra/data/system/local/system-local-ic-1
/var/lib/cassandra/data/system/local/system-local-ic-3
Do you know what are those files?
I crosschecked with the Homestead log file. And it seems like during
Cassandra initialization, Homestead report "connect() failed: Connection
refused".
After that, it reports "Cache caught unknown exception!"
10-02-2015 17:24:00.879 UTC Error cassandra_store.cpp:207: Cache caught
TTransportException: connect() failed: Connection refused
10-02-2015 17:24:00.879 UTC Error main.cpp:550: Failed to initialize cache
- rc 3
10-02-2015 17:24:02.411 UTC Debug zmq_lvc.cpp:144: Enabled XPUB_VERBOSE mode
10-02-2015 17:24:02.411 UTC Error cassandra_store.cpp:207: Cache caught
TTransportException: connect() failed: Connection refused
10-02-2015 17:24:02.413 UTC Error main.cpp:550: Failed to initialize cache
- rc 3
10-02-2015 17:24:16.154 UTC Error cassandra_store.cpp:217: Cache caught
unknown exception!
10-02-2015 17:24:16.154 UTC Error main.cpp:550: Failed to initialize cache
- rc 5
10-02-2015 17:24:56.569 UTC Error cassandra_store.cpp:217: Cache caught
unknown exception!
10-02-2015 17:24:56.572 UTC Debug statistic.cpp:93: Initializing
inproc://H_hss_latency_us statistic reporter
10-02-2015 17:24:56.572 UTC Debug statistic.cpp:93: Initializing
inproc://H_latency_us statistic reporter
10-02-2015 17:24:56.572 UTC Error main.cpp:550: Failed to initialize cache
- rc 5
Another problem is about Homer and Homestead-prov. Both logs show "Address
already in use." error.
After checking the port usage, I found that Port 7888 on Homer node and
port 8889 on Homestead are both used by nginx which are supposed to be
assign to Homer and Homestead-prov.
Do you know how to fix this?
I am planning to rebuild the Homestead node and reinsert the numbers using
Bulk-Provisioning method. Do you think if that would help?
Acutually, we used to have a working deployment (Sprint Pacman) using the
same configuration. Is there a way to install previous versions?
Full logs are attached.
Thanks,
Lianjie
On Mon, Feb 9, 2015 at 3:09 PM, Eleanor Merry <[email protected]>
wrote:
> Hi Lianjie,
>
>
>
> I’m glad to hear that Sprout and Chronos are now working!
>
>
>
> For the cassandra issue, looking in the logs there’s a number of cases of
> CorruptSSTableExceptions. I’ve not seen this before, but I believe you can
> use nodetool scrub or sstablescrub to fix up any corruption.
>
>
>
> Also, how are you stopping Homer, Homestead-prov and Homestead? When you
> stop the service, you should stop both the service and its associated
> poll_* script (e.g. “sudo monit stop poll_homestead”), and you shouldn’t
> restart the service using “sudo service <service> restart”, as this can
> cause issues where two versions of the service start up.
>
>
>
> Ellie
>
>
>
>
>
> *From:* Lianjie Cao [mailto:[email protected]]
> *Sent:* 06 February 2015 20:32
> *To:* Eleanor Merry
> *Cc:* [email protected]
> *Subject:* Re: [Clearwater] Problems with Sprout clustering and Homestead
> failure
>
>
>
> Hi Ellie,
>
>
>
> Thanks a lot for the response!
>
> I modified Sprout and Chronos configurations. They are working correctly
> now!
>
>
>
> I checked Cassandra on Homestead node. The log does show a few errors
> during initialization. But it started successfully finally. The Cassandra,
> Homestead and Homestead-prov logs are attached.
>
>
>
> Actually, I did run into the same problem before. But after rebooting
> Homestead node a few times, it works fine. So, I didn't dig into it.
>
> Is it possible that the problem is due to some starting conflicts among
> Cassandra, Homestead and Homestead-prov?
>
>
>
> Thanks,
>
> Lianjie
>
>
>
> On Wed, Feb 4, 2015 at 3:25 PM, Eleanor Merry <
> [email protected]> wrote:
>
> Hi Lianjie,
>
> Your configuration files aren't quite right.
>
> The cluster_settings file should have the form servers=<address>,<address>
> - so in your case it would be "servers=192.168.1.21:11211,
> 192.168.1.22:11211". This file should be identical on each Sprout node
> (so the sprouts must be in same order on each node).
>
> The chronos.conf file should have one localhost entry, which is set to the
> IP address of the local node, and multiple node entries, which are set to
> the IP addresses of each node in the cluster. In your case, this would be
> (on sprout 1):
>
> [cluster]
> localhost = 192.168.1.21
> node = 192.168.1.21
> node = 192.168.1.22
>
> The order of the nodes must be the same on each node - so the file on
> sprout 2 should be:
>
> [cluster]
> localhost = 192.168.1.22
> node = 192.168.1.21
> node = 192.168.1.22
>
> Can you make these changes to the config files, and then reload Sprout and
> Chronos (sudo service <service> reload)?
>
> In the logs below, Homestead has stopped because it couldn't contact
> cassandra:
>
> 04-02-2015 18:42:19.616 UTC Error cassandra_store.cpp:207: Cache caught
> TTransportException: connect() failed: Connection refused
> 04-02-2015 18:42:19.616 UTC Error main.cpp:550: Failed to initialize cache
> - rc 3
> 04-02-2015 18:42:19.616 UTC Status cassandra_store.cpp:185: Stopping cache
>
> Can you check whether Cassandra is running reliably on the Homestead node?
> Does /var/monit/monit.log show that monit is restarting it, and are there
> any logs in /var/log/cassandra?
>
> Ellie
>
> -----Original Message-----
> From: [email protected] [mailto:
> [email protected]] On Behalf Of Lianjie Cao
> Sent: 04 February 2015 19:37
> To: [email protected]
> Subject: [Clearwater] Problems with Sprout clustering and Homestead failure
>
> Hi,
>
> We recently built a Clearwater deployment with one Bono node, two Sprout
> nodes, one Homestead node, one Homer node and one Ralf node. Howerver, we
> ran into some problems related to Homestead start failure and Sprout
> clustering.
>
> *Sprout clustering:*
>
> The manual installation instruction shows for the latest version Sprout
> clustering is done by Chronos. To add or remove a Sprout node,
> /etc/chronos/chronos.conf needs to modified correspondingly.
> However, we found that when we don't have chronos.conf file, the two
> Sprout nodes seems working fine by adding IPs of the two Sprout nodes to
> /etc/clearwater/cluster_settings.
>
> [sprout]cw@sprout-2:~$ cat /etc/clearwater/cluster_settings
> servers=192.168.1.21:11211
> servers=192.168.1.22:11211
>
> But, if we do add /etc/chronos/chronos.conf with the information of two
> Sprout nodes as below, Chronos failed and no new log files found under
> /var/log/chronos.
>
> [sprout]cw@sprout-1:/var/log/chronos$ cat /etc/chronos/chronos.conf
> [http] bind-address = 0.0.0.0 bind-port = 7253
>
> [logging]
> folder = /var/log/chronos
> level = 5
>
> [cluster]
> localhost = 192.168.1.21
> node = localhost
>
> sprout-2 = 192.168.1.22
> node = sprout-2
>
> [alarms]
> enabled = true
>
>
> [sprout]cw@sprout-1:~$ sudo monit status The Monit daemon 5.8.1 uptime: 0m
>
> Program 'poll_sprout'
> status Status ok
> monitoring status Monitored
> last started Wed, 04 Feb 2015 11:20:36
> last exit value 0
> data collected Wed, 04 Feb 2015 11:20:36
>
> Process 'sprout'
> status Running
> monitoring status Monitored
> pid 1157
> parent pid 1
> uid 999
> effective uid 999
> gid 999
> uptime 1m
> children 0
> memory kilobytes 42412
> memory kilobytes total 42412
> memory percent 1.0%
> memory percent total 1.0%
> cpu percent 0.4%
> cpu percent total 0.4%
> data collected Wed, 04 Feb 2015 11:20:36
>
> Program 'poll_memcached'
> status Status ok
> monitoring status Monitored
> last started Wed, 04 Feb 2015 11:20:36
> last exit value 0
> data collected Wed, 04 Feb 2015 11:20:36
>
> Process 'memcached'
> status Running
> monitoring status Monitored
> pid 1092
> parent pid 1
> uid 108
> effective uid 108
> gid 114
> uptime 1m
> children 0
> memory kilobytes 1180
> memory kilobytes total 1180
> memory percent 0.0%
> memory percent total 0.0%
> cpu percent 0.0%
> cpu percent total 0.0%
> data collected Wed, 04 Feb 2015 11:20:36
>
> Process 'clearwater_diags_monitor'
> status Running
> monitoring status Monitored
> pid 1072
> parent pid 1
> uid 0
> effective uid 0
> gid 0
> uptime 1m
> children 1
> memory kilobytes 1796
> memory kilobytes total 2172
> memory percent 0.0%
> memory percent total 0.0%
> cpu percent 0.0%
> cpu percent total 0.0%
> data collected Wed, 04 Feb 2015 11:20:36
>
> Process 'chronos'
> status Execution failed
> monitoring status Monitored
> data collected Wed, 04 Feb 2015 11:20:26
>
> System 'sprout-1'
> status Running
> monitoring status Monitored
> load average [0.20] [0.09] [0.04]
> cpu 6.8%us 1.1%sy 0.0%wa
> memory usage 116944 kB [2.8%]
> swap usage 0 kB [0.0%]
> data collected Wed, 04 Feb 2015 11:20:26
>
>
> Is it because we are not using Chronos in the right way or there are other
> settings we need to do?
>
> *Homestead Failure:*
>
>
> When we use SIPp to perform user registration tests, we receive “403
> Forbidden" response and we observed error on both sprout nodes.
>
> [sprout]cw@sprout-1:~$ cat /var/log/sprout/sprout_current.txt
> 04-02-2015 18:54:50.884 UTC Warning acr.cpp:627: Failed to send Ralf ACR
> message (0x7fce241cd780), rc = 400
> 04-02-2015 18:54:51.083 UTC Error httpconnection.cpp:573:
>
> http://hs.hp-clearwater.com:8888/impi/6500000008%40hp-clearwater.com/av?impu=sip%3A6500000008%40hp-clearwater.com
> failed at server 192.168.1.31 : Timeout was reached (28) : fatal
> 04-02-2015 18:54:51.083 UTC Error httpconnection.cpp:688: cURL failure
> with cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500
> 04-02-2015 18:54:51.083 UTC Error hssconnection.cpp:145: Failed to get
> Authentication Vector for [email protected]
> 04-02-2015 18:54:51.086 UTC Error httpconnection.cpp:688: cURL failure
> with cURL error code 0 (see man 3 libcurl-errors) and HTTP error code 400
> 04-02-2015 18:54:51.086 UTC Warning acr.cpp:627: Failed to send Ralf ACR
> message (0x14322c0), rc = 400
> 04-02-2015 18:54:51.282 UTC Error httpconnection.cpp:573:
>
> http://hs.hp-clearwater.com:8888/impi/6500000009%40hp-clearwater.com/av?impu=sip%3A6500000009%40hp-clearwater.com
> failed at server 192.168.1.31 : Timeout was reached (28) : fatal
> 04-02-2015 18:54:51.283 UTC Error httpconnection.cpp:688: cURL failure
> with cURL error code 28 (see man 3 libcurl-errors) and HTTP error code 500
> 04-02-2015 18:54:51.283 UTC Error hssconnection.cpp:145: Failed to get
> Authentication Vector for [email protected]
> 04-02-2015 18:54:51.286 UTC Error httpconnection.cpp:688: cURL failure
> with cURL error code 0 (see man 3 libcurl-errors) and HTTP error code 400
> 04-02-2015 18:54:51.286 UTC Warning acr.cpp:627: Failed to send Ralf ACR
> message (0x7fce1c1fdef0), rc = 400 ....
>
>
> It seems like Homestead is unreachable.
> Then on Homestead node, if we check status using monit:
>
> [homestead]cw@homestead-1:~$ sudo monit status The Monit daemon 5.8.1
> uptime: 15m
>
> Process 'nginx'
> status Running
> monitoring status Monitored
> pid 1044
> parent pid 1
> uid 0
> effective uid 0
> gid 0
> uptime 15m
> children 4
> memory kilobytes 1240
> memory kilobytes total 8448
> memory percent 0.0%
> memory percent total 0.2%
> cpu percent 0.0%
> cpu percent total 0.0%
> port response time 0.000s to 127.0.0.1:80/ping [HTTP via
> TCP]
> data collected Wed, 04 Feb 2015 10:58:02
>
> Program 'poll_homestead'
> status Status failed
> monitoring status Monitored
> last started Wed, 04 Feb 2015 10:58:02
> last exit value 1
> data collected Wed, 04 Feb 2015 10:58:02
>
> Process 'homestead'
> status Does not exist
> monitoring status Monitored
> data collected Wed, 04 Feb 2015 10:58:02
>
> Program 'poll_homestead-prov'
> status Status ok
> monitoring status Monitored
> last started Wed, 04 Feb 2015 10:58:02
> last exit value 0
> data collected Wed, 04 Feb 2015 10:58:02
>
> Process 'homestead-prov'
> status Execution failed
> monitoring status Monitored
> data collected Wed, 04 Feb 2015 10:58:32
>
> Process 'clearwater_diags_monitor'
> status Running
> monitoring status Monitored
> pid 1027
> parent pid 1
> uid 0
> effective uid 0
> gid 0
> uptime 16m
> children 1
> memory kilobytes 1664
> memory kilobytes total 2040
> memory percent 0.0%
> memory percent total 0.0%
> cpu percent 0.0%
> cpu percent total 0.0%
> data collected Wed, 04 Feb 2015 10:58:32
>
> Program 'poll_cassandra_ring'
> status Status ok
> monitoring status Monitored
> last started Wed, 04 Feb 2015 10:58:32
> last exit value 0
> data collected Wed, 04 Feb 2015 10:58:32
>
> Process 'cassandra'
> status Running
> monitoring status Monitored
> pid 1280
> parent pid 1277
> uid 106
> effective uid 106
> gid 113
> uptime 16m
> children 0
> memory kilobytes 1388648
> memory kilobytes total 1388648
> memory percent 34.3%
> memory percent total 34.3%
> cpu percent 0.4%
> cpu percent total 0.4%
> data collected Wed, 04 Feb 2015 10:58:32
>
> System 'homestead-1'
> status Running
> monitoring status Monitored
> load average [0.00] [0.04] [0.05]
> cpu 3.0%us 0.8%sy 0.0%wa
> memory usage 1505324 kB [37.1%]
> swap usage 0 kB [0.0%]
> data collected Wed, 04 Feb 2015 10:58:32
>
>
> And log file shows:
>
> [homestead]cw@homestead-1:~$ cat
> /var/log/homestead-prov/homestead-prov-err.log
> Traceback (most recent call last):
> File "/usr/lib/python2.7/runpy.py", line 162, in _run_module_as_main
> "__main__", fname, loader, pkg_name)
> File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
> exec code in run_globals
> File
>
> "/usr/share/clearwater/homestead/env/lib/python2.7/site-packages/crest-0.1-py2.7.egg/metaswitch/crest/main.py",
> line 156, in <module>
> standalone()
> File
>
> "/usr/share/clearwater/homestead/env/lib/python2.7/site-packages/crest-0.1-py2.7.egg/metaswitch/crest/main.py",
> line 119, in standalone
> reactor.listenUNIX(unix_sock_name, application)
> File
>
> "/usr/share/clearwater/homestead/env/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg/twisted/internet/posixbase.py",
> line 413, in listenUNIX
> p.startListening()
> File
>
> "/usr/share/clearwater/homestead/env/local/lib/python2.7/site-packages/Twisted-12.3.0-py2.7-linux-x86_64.egg/twisted/internet/unix.py",
> line 293, in startListening
> raise CannotListenError, (None, self.port, le)
> twisted.internet.error.CannotListenError: Couldn't listen on
> any:/tmp/.homestead-prov-sock-0: [Errno 98] Address already in use.
> ......
>
> [homestead]cw@homestead-1:~$ cat
> /var/log/homestead-prov/homestead-prov-0.log
> 2015-02-04 18:42:23,476 UTC INFO main:118 Going to listen for HTTP on UNIX
> socket /tmp/.homestead-prov-sock-0
> 2015-02-04 18:42:24,087 UTC INFO main:118 Going to listen for HTTP on UNIX
> socket /tmp/.homestead-prov-sock-0
> 2015-02-04 18:42:35,826 UTC INFO main:118 Going to listen for HTTP on UNIX
> socket /tmp/.homestead-prov-sock-0
> 2015-02-04 18:43:16,205 UTC INFO main:118 Going to listen for HTTP on UNIX
> socket /tmp/.homestead-prov-sock-0 ......
>
> homestead_20150204T180000Z.txt homestead_current.txt
> [homestead]cw@homestead-1:~$ cat /var/log/homestead/homestead_current.txt
> 04-02-2015 18:42:19.586 UTC Status main.cpp:468: Log level set to 2
> 04-02-2015 18:42:19.602 UTC Status main.cpp:489: Access logging enabled to
> /var/log/homestead
> 04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:93: Constructing
> LoadMonitor
> 04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:94: Target latency
> (usecs) : 100000
> 04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:95: Max bucket size
> : 20
> 04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:96: Initial token
> fill rate/s: 10.000000
> 04-02-2015 18:42:19.614 UTC Status load_monitor.cpp:97: Min token fill
> rate/s : 10.000000
> 04-02-2015 18:42:19.614 UTC Status dnscachedresolver.cpp:90: Creating
> Cached Resolver using server 127.0.0.1
> 04-02-2015 18:42:19.614 UTC Status httpresolver.cpp:50: Created HTTP
> resolver
> 04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:145: Configuring
> store
> 04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:146: Hostname:
> localhost
> 04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:147: Port:
> 9160
> 04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:148: Threads: 10
> 04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:149: Max Queue: 0
> 04-02-2015 18:42:19.614 UTC Status cassandra_store.cpp:199: Starting store
> 04-02-2015 18:42:19.616 UTC Error cassandra_store.cpp:207: Cache caught
> TTransportException: connect() failed: Connection refused
> 04-02-2015 18:42:19.616 UTC Error main.cpp:550: Failed to initialize cache
> - rc 3
> 04-02-2015 18:42:19.616 UTC Status cassandra_store.cpp:185: Stopping cache
> 04-02-2015 18:42:19.616 UTC Status cassandra_store.cpp:226: Waiting for
> cache to stop ......
>
> And the port usage is:
>
> [homestead]cw@homestead-1:~$ sudo netstat -tulpn Active Internet
> connections (only servers)
> Proto Recv-Q Send-Q Local Address Foreign Address State
> PID/Program name
> tcp 0 0 127.0.0.1:9042 0.0.0.0:* LISTEN
> 1280/jsvc.exec
> tcp 0 0 0.0.0.0:53 0.0.0.0:* LISTEN
> 952/dnsmasq
> tcp 0 0 0.0.0.0:22 0.0.0.0:* LISTEN
> 827/sshd
> tcp 0 0 127.0.0.1:7000 0.0.0.0:* LISTEN
> 1280/jsvc.exec
> tcp 0 0 127.0.0.1:2812 0.0.0.0:* LISTEN
> 1036/monit
> tcp 0 0 0.0.0.0:37791 0.0.0.0:* LISTEN
> 1280/jsvc.exec
> tcp 0 0 0.0.0.0:7199 0.0.0.0:* LISTEN
> 1280/jsvc.exec
> tcp 0 0 0.0.0.0:53313 0.0.0.0:* LISTEN
> 1280/jsvc.exec
> tcp 0 0 127.0.0.1:9160 0.0.0.0:* LISTEN
> 1280/jsvc.exec
> tcp6 0 0 :::53 :::* LISTEN
> 952/dnsmasq
> tcp6 0 0 :::22 :::* LISTEN
> 827/sshd
> tcp6 0 0 :::8889 :::* LISTEN
> 1044/nginx
> tcp6 0 0 :::80 :::* LISTEN
> 1044/nginx
> udp 0 0 0.0.0.0:13344 0.0.0.0:*
> 952/dnsmasq
> udp 0 0 0.0.0.0:48567 0.0.0.0:*
> 952/dnsmasq
> udp 0 0 0.0.0.0:53 0.0.0.0:*
> 952/dnsmasq
> udp 0 0 0.0.0.0:41016 0.0.0.0:*
> 952/dnsmasq
> udp 0 0 0.0.0.0:68 0.0.0.0:*
> 634/dhclient3
> udp 0 0 192.168.1.31:123 0.0.0.0:*
> 791/ntpd
> udp 0 0 127.0.0.1:123 0.0.0.0:*
> 791/ntpd
> udp 0 0 0.0.0.0:123 0.0.0.0:*
> 791/ntpd
> udp6 0 0 :::53 :::*
> 952/dnsmasq
> udp6 0 0 fe80::f816:3eff:fe7:123 :::*
> 791/ntpd
> udp6 0 0 ::1:123 :::*
> 791/ntpd
> udp6 0 0 :::123 :::*
> 791/ntpd
>
>
>
> So, how should we fix the problems with Homestead and Homestead-prov?
>
> Best regards,
> Lianjie
>
> _______________________________________________
> Clearwater mailing list
> [email protected]
> http://lists.projectclearwater.org/listinfo/clearwater
>
>
>
_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/listinfo/clearwater