Matt,

I created a new set of issues by turning down the EC2 instances over the 
weekend. They of course came back up with different IP addresses. I had to fix 
up DNS, and started to update the IP addresses in the various config files on 
the servers. I was actually able to follow the steps below, and create some 
test accounts. After attempting to register with a Bria client, I noticed there 
were still a number of config files that needed to be updated. As such, I 
decided to destroy the deployment (knife deployment delete -E clearwater) and 
create a new one. I also re-cloned the Clearwater chef repo.

There was one other issue I ran into doing this. The Chef client was unable to 
update the Chef server:

ERROR: Server returned error for 
http://chef-server.<domain>:4000/cookbooks/apt/1.9.3, retrying 1/5 in 3s


The Chef server logs were showing:

merb : chef-server (api) : worker (port 4000) ~ Connection failed - user: chef 
- (Bunny::ProtocolError)


It turns out that RabbitMQ had lost the "chef" account as a result of the Chef 
server EC2 instance having been shutdown. This issue is mentioned here:

http://codeblog.majakorpi.net/post/34180903354/chef-server-rabbitmq-trouble-with-ubuntu-12-04-1


The following workaround solved the issue:

rabbitmqctl add_vhost /chef
rabbitmqctl add_user chef <amqp password>
rabbitmqctl set_permissions -p /chef chef ".*" ".*" ".*"


Otherwise, the new deployment went smoothly. The DNS entries were correctly 
created this time. Ellis initially showed "failed to update server" but 
starting working after a few mins.

I'm now able to successfully place calls between 2 Bria clients ! It works fine 
if the clients have recently registered, but after a while, calls are rejected. 
I'm guessing this is just a matter of tuning timers to keep pinholes open.

Thanks for all the assistance !

James





From: Matt Williams [mailto:[email protected]]
Sent: Monday, May 13, 2013 8:01 AM
To: Jackson, James; clearwater at lists.projectclearwater.org
Subject: RE: incorrect signup code ?

James,

Thanks for your email.

I've checked the DNS issue, and think it's very likely that you hit the issue 
fixed by pull request 11 (https://github.com/Metaswitch/chef/pull/11).  If you 
do turn up another system at some stage, please let me know if you see this 
issue again.

On the Cassandra issues you're seeing, please can you

*         restart Cassandra by issuing "sudo service cassandra stop" (and monit 
will then restart it automatically) on both homestead and homer

*         on homestead, run (as root)
HOMESTEAD_DIR=/usr/share/clearwater/homestead

if echo "describe KEYSPACE homestead;" | cqlsh -3 localhost 2>/dev/null | grep 
-q "CREATE KEYSPACE homestead"

then

    echo "Homer database already exists in Cassandra, not configuring"

else

    echo "Initializing homestead Cassandra database"

    echo "create KEYSPACE homestead with strategy_class = 'SimpleStrategy' AND 
strategy_options:replication_factor = 2;" | cqlsh -3 localhost



    cd $HOMESTEAD_DIR

    $HOMESTEAD_DIR/env/bin/python 
$HOMESTEAD_DIR/src/metaswitch/crest/tools/create_db.py

    cd -

fi

*         on homer, run (as root)
HOMER_DIR=/usr/share/clearwater/homer

if echo "describe KEYSPACE homer;" | cqlsh -3 localhost 2>/dev/null | grep -q 
"CREATE KEYSPACE homer"

then

    echo "Homer database already exists in Cassandra, not configuring"

else

    echo "Initializing homer Cassandra database"

    echo "create KEYSPACE homer with strategy_class = 'SimpleStrategy' AND 
strategy_options:replication_factor = 2;" | cqlsh -3 localhost



    cd $HOMER_DIR

    $HOMER_DIR/env/bin/python $HOMER_DIR/src/metaswitch/crest/tools/create_db.py

    cd -

fi
These commands are based on the Debian package post-install scripts at 
https://github.com/Metaswitch/crest/blob/dev/debian/homestead.postinst and 
https://github.com/Metaswitch/crest/blob/dev/debian/homer.postinst.

After doing this, you should find that the cqlsh commands that previously 
failed now succeed, and hopefully you'll be able to create subscribers through 
ellis.

Also, note that my comment below about starting/stopping services was wrong.  
We use monit to monitor our processes, so if you stop a component it will be 
restarted fairly quickly.  Instead, you should use "monit (start|stop) 
(homestead|homer)".  As above, this isn't actually necessary for the problem 
you're seeing, but letting you know just in case you need it in future.

Thanks,

Matt

From: Jackson, James [mailto:[email protected]]
Sent: 10 May 2013 22:14
To: Matt Williams; clearwater at lists.projectclearwater.org<mailto:clearwater 
at lists.projectclearwater.org>
Subject: RE: incorrect signup code ?

It seems to be Cassandra. "homer" and "homestead" keyspaces do not exist.

Thanks,
James

From: Matt Williams [mailto:[email protected]]
Sent: Friday, May 10, 2013 3:17 PM
To: Jackson, James; clearwater at lists.projectclearwater.org<mailto:clearwater 
at lists.projectclearwater.org>
Subject: RE: incorrect signup code ?

James,

I'll take a look at the DNS issue on Monday - there is a use_subdomain setting 
that changes the behavior here, and with which I found and fixed a bug today, 
so this could be related.  Manually setting up the domain names as you have 
done is a good workaround.

One possible cause for the timeout behavior you're seeing is Cassandra being in 
an odd state.  Cassandra is the backing store for homer and homestead and I 
have seen instances where chef's automated install leaves it without a schema.  
To check that it's functioning properly, can you try running "cqlsh -3" from 
the command-line?  This will get you into Cassandra's CQL shell.  You can then 
type "use homer;" or "use homestead;" to set the database to access and then 
"select * from simservs;" (on homer) or "select * from sip_digests;" (on 
homestead).  Note that while these commands look like SQL, they're actually a 
bit more restrictive than that (since Cassandra is a NoSQL store).  Anyway, 
these commands should all succeed, although there probably won't be any data in 
the store yet.  If they fail, this is likely to be the problem, and we should 
be able to fix that up.  I'll add this debugging information to the wiki and, 
if this does turn out to be the problem, look into improving our logging in 
this scenario.  Please let me know how you get on.

You can start and stop the services by typing "service (homestead|homer) 
(start|stop)".  Note that you need to be root for this (or execute it under 
sudo).  It's a good point that this isn't covered in our docs at present - I'll 
get that added.

Thanks,

Matt
________________________________
From: Jackson, James [[email protected]]
Sent: 10 May 2013 19:56
To: Matt Williams; clearwater at lists.projectclearwater.org<mailto:clearwater 
at lists.projectclearwater.org>
Subject: RE: incorrect signup code ?
Thanks for the detailed response !

Re-running the python script on Ellis, it indicates that the 1000 numbers are 
already present in the database.

Ellis logs indicate "HTTP 599: Timeout" communicating with 
homer.clearwater.<domain> port 7888 and hs.clearwater.<domain> port 8888. The 
first issue here is that these do not exist in DNS. There are only 
homer.<domain> and hs.<domain>. I've added these variants to DNS, using the 
internal IP addresses. The timeout still persists. Telnet from Ellis to 
homer.clearwater.<domain> port 7888 succeeds. Telnet from Ellis to 
hs.clearwater.<domain> port 8888 succeeds.

Looking at the homer and homestead logs confirms that the requests are 
reaching, but there are no responses.

Also, is there an official way to stop /start the various services ? I may be 
missing it in the docs.

Thanks,
James



From: Matt Williams [mailto:[email protected]]
Sent: Friday, May 10, 2013 4:22 AM
To: Jackson, James; Jackson, James; clearwater at 
lists.projectclearwater.org<mailto:clearwater at lists.projectclearwater.org>
Subject: RE: incorrect signup code ?

James,

Well deduced - yes, you're right to use the "signup_key" specified in the 
configuration file during installation rather than "vby77rb7e".  I've fixed up 
the documentation - thanks for pointing this out.

On the "Failed to update server" error, there are two common reasons for this.


*         ellis not having any directory numbers to allocate - This can happen 
if the "python create_numbers.py" script (described at 
https://github.com/Metaswitch/clearwater-docs/wiki/Manual%20Install#ellis) 
failed or wasn't run.  It's safe to re-run, so it's probably worth trying it 
again, just in case.


*         ellis not being able to communicate with homer or homestead - This 
can happen if the domain name or IP address of homer or homestead is 
misconfigured on ellis, or if there is firewall configuration blocking traffic 
on ports 7888 or 8888.  It's worth double-checking your configuration but if 
there's nothing obviously wrong, the /var/log/ellis/ellis-0.log and 
/var/log/ellis/ellis-1.log files will report ellis's interactions with these 
servers and which server it's failing to communicate with.

If neither of these is the cause, it's possible that ellis is successfully 
sending a request to homer or homestead, but that homer or homestead is 
rejecting it.  The /var/log/ellis/ellis-*.log files mentioned above should 
point out which of homer or homestead is doing this.  Once you've established 
this, you can look at the /var/log/homestead/homestead-0.log or 
/var/log/homer/homer-0.log files for more information.  If this doesn't shed 
any light on the problem, please could you share the log files, and I'll take a 
look?

Thanks,

Matt

From: clearwater-bounces at 
lists.projectclearwater.org<mailto:clearwater-bounces at 
lists.projectclearwater.org> 
[mailto:[email protected]] On Behalf Of Jackson, 
James
Sent: 10 May 2013 05:33
To: ATT - James Jackson; clearwater at 
lists.projectclearwater.org<mailto:clearwater at lists.projectclearwater.org>
Subject: Re: [Clearwater] incorrect signup code ?

An API key was specified in a file during installation. Using that key allows 
the account to be created, but there's an error "Failed to update server" when 
trying to add a number.

From: clearwater-bounces at 
lists.projectclearwater.org<mailto:clearwater-bounces at 
lists.projectclearwater.org> 
[mailto:[email protected]] On Behalf Of Jackson, 
James
Sent: Thursday, May 09, 2013 10:43 PM
To: clearwater at lists.projectclearwater.org<mailto:clearwater at 
lists.projectclearwater.org>
Subject: [Clearwater] incorrect signup code ?

I've installed Clearwater (automated install) and I'm trying to setup an 
account as described here:

https://github.com/Metaswitch/clearwater-docs/wiki/Making%20your%20first%20call

It says to use signup code "vby77rb7e", but that comes back as an incorrect 
code.

What should we be using ?

Thanks,
James

-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.projectclearwater.org/pipermail/clearwater/attachments/20130514/3f24f4b6/attachment.html>

Reply via email to