Hi Bud, The smoking gun in the cluster_manager log would be the message “Calling plugin method CassandraPlugin.on_joining_cluster” – if that appears overnight, that’ll be the point where your subscribers were deleted.
The other thing you could try to do to narrow down a timestamp is running `ls -lrth /var/lib/cassandra/data/`. The timestamp on those directories might suggest when the deletion/recreation happened. You say that /var/log/cassandra/system.log has no real timestamps – a typical line from that file on my machine looks like “INFO [main] 2016-04-21 10:53:37,457 YamlConfigurationLoader.java (line 80) Loading settings from file:/etc/cassandra/cassandra.yaml”, so that sounds odd. Could you give an example of what you’re seeing? Thanks, Rob From: Bud Asterisk [mailto:[email protected]] Sent: 21 April 2016 02:05 To: Robert Day (projectclearwater.org) <[email protected]> Cc: [email protected] Subject: Re: [Project Clearwater] latest OVF AIO stable? Rob, The subscribers have now been deleted from CW. It happened in the last 10 hours and am not sure when. System.log.x is 20 megs with no real timestamp to them and cluster_manager is a meg each and am not sure which one I should send due to the failure. Any hints as to what to search on within each hour of log file? I will install the latest OVF just release to see if there is any difference. Bud, On Tue, Apr 19, 2016 at 2:27 PM, Robert Day (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> wrote: Hi Bud, The Homestead logs are interesting here – there’s a bunch saying this: 18-04-2016 22:27:38.630 UTC Error cassandra_store.cpp:442: Cassandra request failed: rc=3, Exception: connect() failed: Connection refused [1] 18-04-2016 22:27:38.631 UTC Error cassandra_store.cpp:442: Cassandra request failed: rc=3, Exception: connect() failed: Connection refused [1] 18-04-2016 22:27:42.733 UTC Error cassandra_store.cpp:442: Cassandra request failed: rc=3, Exception: connect() failed: Connection refused [1] 18-04-2016 22:27:42.735 UTC Error cassandra_store.cpp:442: Cassandra request failed: rc=3, Exception: connect() failed: Connection refused [1] 18-04-2016 22:27:50.849 UTC Error cassandra_store.cpp:442: Cassandra request failed: rc=3, Exception: connect() failed: Connection refused [1] 18-04-2016 22:27:50.851 UTC Error cassandra_store.cpp:442: Cassandra request failed: rc=3, Exception: connect() failed: Connection refused [1] and then a log saying: 18-04-2016 22:32:23.837 UTC Error cassandra_store.cpp:442: Cassandra request failed: rc=1, Exception: Default TException. [Keyspace 'homestead_cache' does not exist] which suggests that Cassandra has come back up but lost all its data. This would explain why your subscribers stop working, if they’ve been deleted from the database! One possible culprit for this is clearwater-cluster-manager, if it’s somehow got into a loop of reclustering and clearing out the database when it does so. Could you send /var/log/cassandra/system.log and the logs from /var/log/clearwater-cluster-manager (ideally from the same time period as these Homestead logs, so 22:00 UTC yesterday)? Thanks, Rob From: Bud Asterisk [mailto:[email protected]<mailto:[email protected]>] Sent: 19 April 2016 00:25 To: Robert Day (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: Re: [Project Clearwater] latest OVF AIO stable? Logs attached of the failure to register. Provisioned 6135551212<tel:6135551212> from the CLI fine and it immediately registered. It is not clear if it was the next reregister attempt or what but you can see the errors generated in the Sprout logs attached. My softclient will not reregister. If I was to provision a new user it would work fine, but then would fail again. On Mon, Apr 18, 2016 at 10:22 AM, Robert Day (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> wrote: Hi Bud, It sounds like we’ve got a good handle on what’s going on with the Ellis side of this issue – that if you restart one of Ellis’s backend services (homer/homestead) while it’s running, the Ellis UI doesn’t recover from that failure until you log out and log in again. While we fix that, you might want to look at creating users through our command-line tools instead (http://clearwater.readthedocs.org/en/stable/Provisioning_Subscribers.html) – these are simpler tools with a bit more flexibility (e.g. you can set your own password rather than one being generated). The only possible downside is that they won’t set up things like call forwarding services in Homer, but it sounds like you’re primarily interested in Ralf testing. You also mentioned that subscribers fail to register after a while, but I think that’s a different problem – Ellis is only involved in creating subscribers and provisioning them into the backend databases, so an Ellis failure shouldn’t make a working subscriber stop working. Could you send across the Bono, Sprout and Homestead (/var/log/homestead, not /var/log/homestead-prov) logs for when a subscriber successfully registers, and then when it fails to register? Those are the only three components involved in registrations, so anything going wrong should show up in those logs. Thanks, Rob From: Clearwater [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Chris Elford (projectclearwater.org<http://projectclearwater.org>) Sent: 15 April 2016 17:07 To: Bud Asterisk <[email protected]<mailto:[email protected]>>; Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: Re: [Project Clearwater] latest OVF AIO stable? Thanks Bud, I’ll look into that when I get back from my vacation. Chris From: Bud Asterisk [mailto:[email protected]] Sent: 14 April 2016 13:57 To: Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: Re: [Project Clearwater] latest OVF AIO stable? Hi Chris, Actually in the first way you did it, you think you are not creating users but I think you actually are. Check your addressbook before you try to create the users. Count how many entries, now try and create users. You will get the failed to update error. I did it a few times and then went back to the addressbook. For me it showed that new numbers were created! Additionally after some form of restart processes or system those users who existed prior to the restart are not always able to register. Since this issue is easy to reproduce now I hope you find a fix soon :) Bud, On Thu, Apr 14, 2016 at 4:59 AM, Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> wrote: Thanks Bud, I’ve tried this on my own All-in-one node. When I restart either homestead-prov or ellis (using sudo service homestead-prov stop or sudo service ellis stop), then try to create a user, I get the following message: Failed to update the server (see detailed diagnostics in developer console). Please refresh the page. The user was not created. This makes sense – if either of those processes is not running, then it’s not possible to create subscribers. When the stopped process comes back, I am able to create subscribers and see their passwords. I tried it again, but this time refreshed (as suggested by the error message). This showed me another “failed to update” message, and showed me a screen with none of my private identities. When I tried to add one, it took me back to the original page, with all of the passwords blanked out. Were you hitting refresh? If so, you may be able to work around this problem by not doing that. Please let me know if that solves your problem. I’ll see if we can make the error text more useful. Yours, Chris From: Bud Asterisk [mailto:[email protected]<mailto:[email protected]>] Sent: 12 April 2016 03:02 To: Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: Re: [Project Clearwater] latest OVF AIO stable? Just to add to that. I can log out as the user who was getting errors and sets not registering. Then sign up as a new user fine and create SIP accounts and make calls. Any time I make a config change though to CW and restart something, that user becomes pooched. Can sign out again and sign up! Repeat the process just fine :) Was also able to load the Ralf SW on my AIO VM and get it pointed to my CDF, but have a few CDF bugs to fix. CW is sending CER and my CEA needs a tweak. On Mon, Apr 11, 2016 at 8:12 PM, Bud Asterisk <[email protected]<mailto:[email protected]>> wrote: Chris, You may have to send me a fine cask ale for these test results ;) Logs attached. When I bring it up, I can provision a subscriber fine and make test calls between subscribers. Then I installed the Ralf SW and changed the shared config with the IP addy and port used for Ralf. Restarted and that is when things died again. 'Failed to update' errors and my sets wont reregister. On Mon, Apr 11, 2016 at 11:13 AM, Bud Asterisk <[email protected]<mailto:[email protected]>> wrote: Roger that....thanks! I will do a fresh VM install and capture it all today. On Mon, Apr 11, 2016 at 10:40 AM, Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> wrote: Hi Bud, I’ve had a look through your logs. It looks like Ellis is trying to get the details of your numbers from homestead-prov so that it can display them, but some of the requests are failing with 404 errors. In order to figure out why that is happening, we will need to look at the homestead-prov logs. Homestead-prov is a separate process to Homestead, which Ellis uses to update and query subscriber information in Homestead’s database. I think the next step is to: • Delete all of your subscribers. • Add one using Ellis (noting the time). • Capture the logs for the time period for Ellis, Homestead, and Homestead-prov (/var/log/ellis, /var/log/homestead, and /var/log/homestead-prov). • Send them over. (I’ve added some more debug logging to make the logs easier to read, but we should be able to understand this without it.) Chris ________________________________ From: Bud Asterisk [mailto:[email protected]<mailto:[email protected]>] Sent: 07 April 2016 20:38 To: Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: Re: [Project Clearwater] latest OVF AIO stable? Reloaded the OVF again and this time round I can create users, but the webconfig "fails to update". The numbers are created in the addressbook but the PW is never displayed to me. Logs attached. On Thu, Apr 7, 2016 at 1:49 PM, Bud Asterisk <[email protected]<mailto:[email protected]>> wrote: Actually the SIP clients stay registered for about 15 mins and then wont register. I was able to make a couple calls back and forth between them. I was about to try and integrate Ralf but they simply wont register to test it with. Is there something blatantly wrong with what I am doing????? I will do another install, set all the debug levels and send logs :) On Wed, Apr 6, 2016 at 9:21 AM, Bud Asterisk <[email protected]<mailto:[email protected]>> wrote: Just to let you know I have calls working on the current build. The UI is still a bit unstable. Next will be SIP trunking integration and Ralf integration. Thx! On Tue, Apr 5, 2016 at 11:17 AM, Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> wrote: It should be ready in the next few days – we will send out a release note when we have uploaded it. I took a look at the logs you sent in your other email. I couldn’t see any INVITE messages arriving at the Sprout node. It may be that the log covers the wrong time period? Chris From: Bud Asterisk [mailto:[email protected]<mailto:[email protected]>] Sent: 05 April 2016 15:14 To: Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> Cc: [email protected]<mailto:[email protected]> Subject: Re: [Project Clearwater] latest OVF AIO stable? Thanks Chris...when is the new image due out? [Image removed by sender.]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> On Tue, Apr 5, 2016 at 6:51 AM, Chris Elford (projectclearwater.org<http://projectclearwater.org>) <[email protected]<mailto:[email protected]>> wrote: Hi Bud, I’ve just had a chat with some of the rest of the team. It looks like we’ve fixed a serious bug with the all-in-one image in the latest release. The latest OVF image should be a lot more stable. Chris From: Clearwater [mailto:[email protected]<mailto:[email protected]>] On Behalf Of Chris Elford (projectclearwater.org<http://projectclearwater.org>) Sent: 04 April 2016 16:23 To: Bud Asterisk <[email protected]<mailto:[email protected]>>; [email protected]<mailto:[email protected]> Subject: Re: [Project Clearwater] latest OVF AIO stable? Hi Bud, We don’t keep any old OVA images in S3. We are due to cut a new image this week, which may solve your problems. In the meantime, I would be interested to start debugging the problem. Can you please send me the log containing the ‘failed to update database’ errors? I’d also be keen to see what debug logs are produced by Sprout when you try to make a call. http://clearwater.readthedocs.org/en/stable/Troubleshooting_and_Recovery.html#sprout tells you how to turn on debug logging. Yours, Chris From: Clearwater [mailto:[email protected]] On Behalf Of Bud Asterisk Sent: 04 April 2016 04:45 To: [email protected]<mailto:[email protected]> Subject: [Project Clearwater] latest OVF AIO stable? Gang, I decided to go the OVF rout for now on a single esxi VM. Wow was that quick to get up and running. Took about 45 mins to get it downloaded, installed and 2 SIP clients working registered. It was a bit wierd as it took a few mins for the SIP clients to register. Once they were registered I tried calling each other and no luck. Just got 403 or 408 back. Now after a couple of hours running even the provisioning GUI is not working well with the failed to update the database errors. Can no longer provision new numbers and the SIP sets wont register. I will reinstall tomorrow AM but is there an older release to try this out on available? The OVF image in the repository is rather recent. Bud, [Image removed by sender.]<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail> Virus-free. www.avast.com<https://www.avast.com/sig-email?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
_______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
