Re: [Project Clearwater] Manual Installation No longer Working

Robert Day (projectclearwater.org) Tue, 19 Apr 2016 11:29:18 -0700

It's odd that it's not doing anything - could you run "sudo bash -x 
/usr/share/clearwater/infrastructure/scripts/create-ellis-nginx-config", which 
should show every command that script runs, and send me the output?

Thanks,
Rob
________________________________
From: Rashid Mijumbi <[email protected]>
Sent: 19 April 2016 00:17
To: Robert Day (projectclearwater.org)
Subject: Re: [Project Clearwater] Manual Installation No longer Working

Hi Rob,

Many thanks for looking at the logs.

I have run sudo 
/usr/share/clearwater/infrastructure/scripts/create-ellis-nginx-config and it 
returns successful (without any output) but still the file 
/etc/nginx/sites-available/ellis does not exist. The folder /var/log/ellis is 
also still empty even after service ellis stop.

I have followed the installation instructions at 
http://clearwater.readthedocs.org/en/stable/Manual_Install.html religiously, so 
I would not not say that I am doing anything wrong!

Regards,

Rashid

On 18 April 2016 at 20:33, Robert Day 
(projectclearwater.org<http://projectclearwater.org>) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Rashid,

Chris sent me the Ellis diagnostics you provided, and asked me to take a look. 
There's no /etc/nginx/sites-available/ellis file, which is odd - can you try 
running "sudo 
/usr/share/clearwater/infrastructure/scripts/create-ellis-nginx-config"?

I'm wondering whether the issue here is that:

*         nginx isn't configured properly

*         so it's not exposing Ellis on port 80

*         so our regular health-checks of Ellis are failing

*         so we keep automatically restarting the Ellis process to try and fix 
it

*         and also, because no requests are getting through to Ellis, it isn't 
doing anything, so isn't producing any logs

That would explain the following diagnostics from monit.log:

[UTC Apr 10 06:51:30] info     : 'ellis_process' restart: /etc/monit/run_logged
[UTC Apr 10 06:51:41] info     : 'ellis_process' process is running with pid 
6122
[UTC Apr 10 06:51:41] error    : 'poll_ellis' HTTP failed to 
http://192.168.40.30/ping
[UTC Apr 10 06:51:51] error    : 'poll_ellis' HTTP failed to 
http://192.168.40.30/ping
[UTC Apr 10 06:51:51] info     : 'poll_ellis' exec: /etc/init.d/ellis
[UTC Apr 10 06:52:01] error    : 'ellis_process' process is not running
[UTC Apr 10 06:52:01] info     : 'ellis_process' trying to restart

If that's the issue, "sudo 
/usr/share/clearwater/infrastructure/scripts/create-ellis-nginx-config" might 
fix things up (or fail with an informative error message).

Best,
Rob

From: Clearwater 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Chris Elford (projectclearwater.org<http://projectclearwater.org>)
Sent: 11 April 2016 16:10
To: Rashid Mijumbi <[email protected]<mailto:[email protected]>>

Cc: 
[email protected]<mailto:[email protected]>
Subject: Re: [Project Clearwater] Manual Installation No longer Working

Hi Rashid,

You file permissions look right to me. It looks like something is going wrong 
with Ellis before it starts producing any output, so it may be hard to find out 
what is going wrong.

Can you send me a diagnostics package from your Ellis node? There are 
instructions at 
https://github.com/Metaswitch/clearwater-infrastructure/blob/master/clearwater-diags-monitor.md.
 The mailing list will filter out large attachments, so I suggest sending it to 
me directly, but we should keep the rest of our conversation on the mailing 
list.

The issue with homer/homestead may be related, but I am not sure yet. I think 
it is best to try to solve one problem at a time.

Yours,

Chris

From: Rashid Mijumbi [mailto:[email protected]]
Sent: 08 April 2016 18:17
To: Chris Elford (projectclearwater.org<http://projectclearwater.org>) 
<[email protected]<mailto:[email protected]>>
Cc: 
[email protected]<mailto:[email protected]>
Subject: Re: [Project Clearwater] Manual Installation No longer Working

Hi Chris,

Many thanks for your support.

I run both sudo monit stop -g ellis and sudo service ellis run and the commands 
return immediately without any output. They also do not change anything on the 
functionality of ellis.

However, as you state, the problem might be file permissions. I have seen that 
/usr/share/clearwater/ellis/env/bin/python exists with permissions rwxr-xr-x 
and owner ellis.

Moreover, /etc/default/ellis does NOT exist, and of course /etc/default exists 
and is owned by root with permissions rwxr-xr-x

Do you think this might also be the cause of homer/homestead installation 
problems ?

Regards,

Rashid

On 7 April 2016 at 16:06, Chris Elford 
(projectclearwater.org<http://projectclearwater.org>) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Rashid,

It looks to me as though Ellis is finishing its installation, but failing to 
start properly, before it gets a chance to write any logs.

We may be able to get some more information by looking at the output when you 
run monit manually.

*         First stop monit from trying to start Ellis automatically: sudo monit 
stop -g ellis

*         Then try to run Ellis: sudo service ellis run

If Ellis doesn't quit after 5 minutes, then you can exit using Ctrl-C.

When you are done, remember to run sudo monit start -g ellis to put everything 
back the way it was.

There are a couple of other things to look at. In the past, we have had some 
trouble with file permission. Does /usr/share/clearwater/ellis/env/bin/python 
exist, and what are its file permissions and owner? Does /etc/default/ellis 
exist, and what are its file permissions and owner?

Yours,

Chris

From: Rashid Mijumbi 
[mailto:[email protected]<mailto:[email protected]>]
Sent: 06 April 2016 17:09
To: Chris Elford (projectclearwater.org<http://projectclearwater.org>) 
<[email protected]<mailto:[email protected]>>
Subject: Re: [Project Clearwater] Manual Installation No longer Working

Hi Chris,

Please see install stderr + stdout files for the VMs as well as output for 
running monit summary and monit status on ellis.

These logs are when I do a clean/new installation. If I remove a single package 
from a node and re-install it, it seems to complete the installation - I attach 
this case for HS when I remove and re-install clearwater-prov-tools.

In any case, as you can see while ellis appears to be running, there are no 
logs at all in /var/log/ellis, and the ellis URL is not available.

FWIW, I install the nodes in the order: Ellis -> Bono -> Sprout -> Homer -> HS 
-> Ralf

Best Regards,

Rashid

On 6 April 2016 at 13:54, Chris Elford 
(projectclearwater.org<http://projectclearwater.org>) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Rashid,

Thank you for trying that.

You ran `sudo monit summary` and everything listed was running. What is the 
output from `sudo monit status`? There may be some missing processes that 
should be running.

It looks like something went wrong installing the package clearwater-prov-tools 
on Homestead. Can you please try installing that again, and capture all of the 
output (stderr and stdout)? You may have to uninstall it first. That will give 
us more information about where the installation may be failing. If Ellis is 
not running, it may be a good idea to do the same thing for Ellis.

Yours,

Chris

From: Rashid Mijumbi 
[mailto:[email protected]<mailto:[email protected]>]
Sent: 05 April 2016 14:46
To: Chris Elford (projectclearwater.org<http://projectclearwater.org>) 
<[email protected]<mailto:[email protected]>>
Cc: 
[email protected]<mailto:[email protected]>
Subject: Re: [Project Clearwater] Manual Installation No longer Working

Hi Chris,

Many thanks for your quick response.

I have to say that my previous shared_config files did not include scscf 
settings since it was mentioned in the installation guide that these were not 
mandatory. However, I have now included the (new) settings.

I have also included the following lines to my DNS zone files (I am not sure if 
I need all of them, or if they are correctly created, but I already have 
similar lines for sprout):

scscf.sprout.$domain.             IN        A         $sprout_ip
scscf-1                                                 IN        A         
$sprout_ip
scscf                                        IN        A         $sprout_ip
scscf                                        IN        NAPTR           1        
  1          "S"       "SIP+D2T" "" _sip._tcp.scscf.sprout
_sip._tcp.scscf.sprout IN        SRV    0          0          5054    scscf-1

With these changes, I still run into the same problems as previously: ellis not 
starting, errors especially for homer, homestead.

Regards,

Rashid

On 5 April 2016 at 11:59, Chris Elford 
(projectclearwater.org<http://projectclearwater.org>) 
<[email protected]<mailto:[email protected]>> wrote:
Hi Rashid,

It looks like there are two separate problems here:

*         Homer is failing to install.

*         Ellis is not producing any logs (so probably isn't running).

You say that you are using the same shared_config as previously. In release 
Articuno, we changed the way Sprout is configured, and this may be causing some 
of your problems. From the release note:

Once you've upgraded, you may also need to change your sproutlet configuration, 
and add a DNS record for your S-CSCF cluster.
Previously most Sproutlets/Application servers (AS) either used the value of 
the 'scscf_uri' parameter for their configuration or had other hard-coded 
configuration. Now you have the finer control over the configuration - each 
Sproutlet/AS has three configuration options. The options have the same format 
for each Sproutlet/AS, as listed here, with <sproutlet> replaced by the 
appropriate Sproutlet or AS name:
*         <sproutlet>: The port that the Sproutlet/AS listens on. The default 
value is 5054 for some Sproutlets/ASs (those enabled by default) and 0 for 
others (those disabled by default)
*         <sproutlet>_prefix: The identifier prefix for this Sproutlet/AS, used 
to build the uri, as described below. The default value is simply the 
Sproutlet/AS name: <sproutlet>
*         <sproutlet>_uri: The full identifier for this Sproutlet/AS, used for 
routing and receiving requests between nodes. The default value is created 
using the prefix and the hostname of the parent Sprout node, i.e. 
"sip:<sproutlet_prefix>.<sprout_hostname>;transport=tcp". We recommend that you 
don't set this yourself anymore, and use the defaults provided.
As a concrete example, below are the S-CSCF options and the default values.
*         scscf=5054
*         scscf_prefix=scscf
*         scscf_uri=sip:scscf.<sprout_hostname>;transport=tcp
As we've split out the S-CSCF configuration, you'll also now need to set up DNS 
records for the S-CSCF cluster specifically (rather than just using the sprout 
cluster).

A good first step would be to update your shared configuration files and see 
whether that fixes any of your issues. Once you have done that, we may be able 
to dig deeper into the other issues.

Yours,

Chris

From: Clearwater 
[mailto:[email protected]<mailto:[email protected]>]
 On Behalf Of Rashid Mijumbi
Sent: 05 April 2016 11:01
To: 
[email protected]<mailto:[email protected]>
Subject: [Project Clearwater] Manual Installation No longer Working

Dear all, I have not been able to successfully do a manual installation the 
last 3 or so days. I did so (multiple times) successfully in the past.

Here are my details:

VMs are running in OpenStack with Ubuntu-14.04-trusty-server-x86_64, 1vCPU, 2GB 
RAM, 8GB Disk

It appears that I am getting multiple errors during installation. One of the 
errors is during installation of homer which repeatedly gives the error:

TCP poll failed to 127.0.0.1 9160
nc: connect to 127.0.0.1 port 9160 (tcp) failed: Connection refused
Failed to connect to '127.0.0.1:7199<http://127.0.0.1:7199>': Connection refused

for about 5 minutes. Installation of homestead gives the same errors for an 
even longer period.

Ultimately, the installation completes, but I cannot access the ellis URL in a 
browser.

I would like to believe that my local_config and shared_config files are okay 
as I have used the same setup in the past and it worked quite well.

The command "sudo service ellis stop" seems to work, while "sudo service 
clearwater-infrastructure restart" gives the following output

 * Restarting clearwater-infrastructure clearwater-infrastructure
nginx: [warn] conflicting server name "87.44.18.128" on [::]:80, ignored
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
Configuring monit for only localhost access

                      [ OK ]

"sudo monit status" says that everything is either running or status is ok, 
while "sudo clearwater-etcdctl cluster-health" shows that all cluster members 
are healthy.

Is anyone facing a similar problem  or is it just something horribly wrong with 
my machines ?

I am attaching a number of installation error log files as well as files from 
/var/log. The folder /var/log/ellis is empty!

Regards,

Rashid

_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org

Re: [Project Clearwater] Manual Installation No longer Working

Reply via email to