Re: [Clearwater] - Sprout_process does not exit, elastic scaling

Eleanor Merry Fri, 09 Oct 2015 08:40:36 -0700

Hi Nicola,

It looks like the existing Sprout's aren't picking up the changes from the new 
Sprout. The clearwater-cluster-manager process on the new Sprout is waiting for 
the clearwater-cluster-manager processes on the other Sprouts to acknowledge 
its existence before it kicks the new Sprout process to reload its 
cluster_settings file (which is why Sprout hasn't recognised that the 
cluster_settings file is available yet).


Can you please send me the clearwater-cluster-manager logs from one of the old 
Sprouts (in /var/log/clearwater-cluster-manager/)?

To remove a node from the deployment, you should follow the docs at 
http://clearwater.readthedocs.org/en/latest/Clearwater_Elastic_Scaling/index.html#if-you-did-a-manual-install
 or   
http://clearwater.readthedocs.org/en/latest/Handling_Failed_Nodes/index.html 
(to force remove the node). In this case though I think it's the existing nodes 
misbehaving though, so removing the new node won't help.

Ellie

From: Clearwater [mailto:[email protected]] On 
Behalf Of Nicola Principe
Sent: 09 October 2015 13:20
To: [email protected]
Subject: Re: [Clearwater] - Sprout_process does not exit, elastic scaling

Hi community,

I have solved the "does not exist" issue (it was due to file permission), and 
then the Sprout cluster was in a not stable state.
I have tried to decommission the new sprout, but clearwater-cluster-manager is 
not able to complete the query:

UTC ERROR common_etcd_synchronizer.py:139 (thread ChronosPlugin): 10.4.0.130 
caught EtcdException("Unable to decode server response: 
HTTPConnectionPool(host='10.4.0.130', port=4000): Read timed out.",) when 
trying to read with index 1478037 - pause before retry

Is there a way to manually decommission a node from a deployment and leave the 
cluster stable?

Thanks.


De: Clearwater [mailto:[email protected]] En 
nombre de Nicola Principe
Enviado el: viernes, 09 de octubre de 2015 12:38
Para: 
[email protected]<mailto:[email protected]>
Asunto: [Clearwater] - Sprout_process does not exit, elastic scaling

Hi community,

I have a PCW deployment with 3 Sprouts and 2 Homesteads.
I have tried to add a 4th Sprout following the automatic clustering scaling 
instructions, but it does not work.

On one of my Sprout already in the deployment I see this (10.4.0.130 is the new 
Sprout):

Describing the Sprout Memcached cluster in site site1:
  The local node is in this cluster
  The cluster is *not* stable
    10.4.0.157 is in state normal
    10.4.0.156 is in state normal
    10.4.0.159 is in state normal
   10.4.0.130 is in state joining, acknowledged change

Describing the Sprout Chronos cluster in site site1:
  The local node is in this cluster
  The cluster is *not* stable
    10.4.0.157 is in state normal
    10.4.0.156 is in state normal
    10.4.0.159 is in state normal
    10.4.0.130 is in state joining, acknowledged change

But on the new Sprout node the sprout_process does not exist:

[sprout]manager@sprout-4:/var/log/sprout$ sudo monit status
The Monit daemon 5.8.1 uptime: 7m

Process 'sprout_process'
  status                            Does not exist
  monitoring status                 Monitored
  data collected                    Fri, 09 Oct 2015 12:23:25

Program 'poll_sprout_sip'
  status                            Initializing
  monitoring status                 Initializing
  data collected                    Fri, 09 Oct 2015 12:15:54

Program 'poll_sprout_http'
  status                            Initializing
  monitoring status                 Initializing
  data collected                    Fri, 09 Oct 2015 12:15:54

In the logs I can see the following:
09-10-2015 10:22:33.009 UTC Error memcached_config.cpp:133: Failed to open 
'/etc/clearwater/cluster_settings'
09-10-2015 10:22:33.009 UTC Error memcachedstore.cpp:184: Failed to read 
config, keeping previous settings
09-10-2015 10:22:33.010 UTC Error main.cpp:1885: Cluster settings file 
'/etc/clearwater/cluster_settings' does not contain a valid set of servers

...but the cluster_settings file has been generated by etcd automatically.

Do you have any suggestion to sort this out?

Thanks,
Nicola

_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org

Re: [Clearwater] - Sprout_process does not exit, elastic scaling

Reply via email to