James,

Great news that you have a deployment up and running!  Apologies that this has 
been a bit more painful than it might have been, but thanks for your feedback - 
hopefully we'll be able to iron these issues out.

On stopping/restarting nodes, yes, it's unfortunate that IP addresses change 
when this happens.  If you have an unclustered deployment, you should be able 
to run 'sudo chef-client' on each node to fix it up.  Unfortunately, this 
doesn't work for clustered deployments, so we generally don't stop/restart 
nodes.  For test deployments, it's actually as quick just to destroy/recreate 
your deployment.  For production deployments, I think we'd like to add elastic 
scaling (so you'd keep the deployment running, but reduce its capacity when 
demand was decreased), but this isn't present yet.  I'll see if there's 
somewhere we can sensibly document not stopping/restarting EC2 instances.

Thanks for your discovery about stopping/restarting chef server - I'll add it 
to our troubleshooting documentation.

On the ellis issue, I have a suspicion that DNS records were still being 
propagated, and this is why data failed to be written to homer/homestead for 
the first minute or two.

On the Bria registrations failing after a few minutes, this could be due to 
pinholes closing, or it could be due to registrations expiring - sprout 
mandates a maximum of 5 minute registrations, signalled back to the client 
using the "expires" parameter on the Contact header.  Not all SIP clients honor 
this parameter - some ignore it and assume that the expires value on the 
request has been accepted.  It's worth checking whether the behavior still 
occurs if you configure Bria to register with an explicit 5 minute expiry.  
Please let me know how you get on on this - there may be something we should 
document here.

Thanks,

Matt

From: Jackson, James [mailto:[email protected]]
Sent: 14 May 2013 16:12
To: Matt Williams; clearwater at lists.projectclearwater.org
Subject: RE: incorrect signup code ?


Matt,

I created a new set of issues by turning down the EC2 instances over the 
weekend. They of course came back up with different IP addresses. I had to fix 
up DNS, and started to update the IP addresses in the various config files on 
the servers. I was actually able to follow the steps below, and create some 
test accounts. After attempting to register with a Bria client, I noticed there 
were still a number of config files that needed to be updated. As such, I 
decided to destroy the deployment (knife deployment delete -E clearwater) and 
create a new one. I also re-cloned the Clearwater chef repo.

There was one other issue I ran into doing this. The Chef client was unable to 
update the Chef server:

ERROR: Server returned error for 
http://chef-server.<domain>:4000/cookbooks/apt/1.9.3<http://chef-server.%3cdomain%3e:4000/cookbooks/apt/1.9.3>,
 retrying 1/5 in 3s


The Chef server logs were showing:

merb : chef-server (api) : worker (port 4000) ~ Connection failed - user: chef 
- (Bunny::ProtocolError)


It turns out that RabbitMQ had lost the "chef" account as a result of the Chef 
server EC2 instance having been shutdown. This issue is mentioned here:

http://codeblog.majakorpi.net/post/34180903354/chef-server-rabbitmq-trouble-with-ubuntu-12-04-1


The following workaround solved the issue:

rabbitmqctl add_vhost /chef
rabbitmqctl add_user chef <amqp password>
rabbitmqctl set_permissions -p /chef chef ".*" ".*" ".*"


Otherwise, the new deployment went smoothly. The DNS entries were correctly 
created this time. Ellis initially showed "failed to update server" but 
starting working after a few mins.

I'm now able to successfully place calls between 2 Bria clients ! It works fine 
if the clients have recently registered, but after a while, calls are rejected. 
I'm guessing this is just a matter of tuning timers to keep pinholes open.

Thanks for all the assistance !

James
-------------- next part --------------
An HTML attachment was scrubbed...
URL: 
<http://lists.projectclearwater.org/pipermail/clearwater/attachments/20130514/9365f0cc/attachment.html>

Reply via email to