James, Great news that you have a deployment up and running! Apologies that this has been a bit more painful than it might have been, but thanks for your feedback - hopefully we'll be able to iron these issues out.
On stopping/restarting nodes, yes, it's unfortunate that IP addresses change when this happens. If you have an unclustered deployment, you should be able to run 'sudo chef-client' on each node to fix it up. Unfortunately, this doesn't work for clustered deployments, so we generally don't stop/restart nodes. For test deployments, it's actually as quick just to destroy/recreate your deployment. For production deployments, I think we'd like to add elastic scaling (so you'd keep the deployment running, but reduce its capacity when demand was decreased), but this isn't present yet. I'll see if there's somewhere we can sensibly document not stopping/restarting EC2 instances. Thanks for your discovery about stopping/restarting chef server - I'll add it to our troubleshooting documentation. On the ellis issue, I have a suspicion that DNS records were still being propagated, and this is why data failed to be written to homer/homestead for the first minute or two. On the Bria registrations failing after a few minutes, this could be due to pinholes closing, or it could be due to registrations expiring - sprout mandates a maximum of 5 minute registrations, signalled back to the client using the "expires" parameter on the Contact header. Not all SIP clients honor this parameter - some ignore it and assume that the expires value on the request has been accepted. It's worth checking whether the behavior still occurs if you configure Bria to register with an explicit 5 minute expiry. Please let me know how you get on on this - there may be something we should document here. Thanks, Matt From: Jackson, James [mailto:[email protected]] Sent: 14 May 2013 16:12 To: Matt Williams; clearwater at lists.projectclearwater.org Subject: RE: incorrect signup code ? Matt, I created a new set of issues by turning down the EC2 instances over the weekend. They of course came back up with different IP addresses. I had to fix up DNS, and started to update the IP addresses in the various config files on the servers. I was actually able to follow the steps below, and create some test accounts. After attempting to register with a Bria client, I noticed there were still a number of config files that needed to be updated. As such, I decided to destroy the deployment (knife deployment delete -E clearwater) and create a new one. I also re-cloned the Clearwater chef repo. There was one other issue I ran into doing this. The Chef client was unable to update the Chef server: ERROR: Server returned error for http://chef-server.<domain>:4000/cookbooks/apt/1.9.3<http://chef-server.%3cdomain%3e:4000/cookbooks/apt/1.9.3>, retrying 1/5 in 3s The Chef server logs were showing: merb : chef-server (api) : worker (port 4000) ~ Connection failed - user: chef - (Bunny::ProtocolError) It turns out that RabbitMQ had lost the "chef" account as a result of the Chef server EC2 instance having been shutdown. This issue is mentioned here: http://codeblog.majakorpi.net/post/34180903354/chef-server-rabbitmq-trouble-with-ubuntu-12-04-1 The following workaround solved the issue: rabbitmqctl add_vhost /chef rabbitmqctl add_user chef <amqp password> rabbitmqctl set_permissions -p /chef chef ".*" ".*" ".*" Otherwise, the new deployment went smoothly. The DNS entries were correctly created this time. Ellis initially showed "failed to update server" but starting working after a few mins. I'm now able to successfully place calls between 2 Bria clients ! It works fine if the clients have recently registered, but after a while, calls are rejected. I'm guessing this is just a matter of tuning timers to keep pinholes open. Thanks for all the assistance ! James -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.projectclearwater.org/pipermail/clearwater/attachments/20130514/9365f0cc/attachment.html>
