Hi Nicola, You should have the IP addresses of all four Sprouts in the etcd_cluster value on your fifth Sprout - the etcd_cluster is only used at start of day when the node first joins the cluster, and it should contain the IP addresses of any node that is currently a member of the etcd_cluster.
Do you know if your cluster-manger process was running during the scale up on your old Sprouts? Would you be able to send me your /var/log/monit.log? Were there any logs in /var/log/clearwater-cluster-manager/cluster-manager.output.log or in /var/log/clearwater-cluster-manager/cluster-manager_<timestamp>.log? Ellie From: Nicola Principe [mailto:[email protected]] Sent: 13 October 2015 08:43 To: Eleanor Merry Cc: [email protected] Subject: RE: [Clearwater] - Sprout_process does not exit, elastic scaling Hi Ellie, I do not have the logs at the moment because I have already cleaned the scenario and started again with my old deployment (3 sprouts) trying to add a 4th one. I found myself again in the same situation described below, but this time, thanks to your suggestions, I checked the cluster-manager logs of the existing Sprouts and noticed that they were just empty! I then restarted clearwater-cluster-manager on my 3 existing Sprouts and also applied the shared config on all the 4. It worked. Describing the Sprout Memcached cluster in site site1: The local node is in this cluster The cluster is stable 10.4.0.157 is in state normal 10.4.0.156 is in state normal 10.4.0.159 is in state normal 10.4.0.130 is in state normal Describing the Sprout Chronos cluster in site site1: The local node is in this cluster The cluster is stable 10.4.0.157 is in state normal 10.4.0.156 is in state normal 10.4.0.159 is in state normal 10.4.0.130 is in state normal It's strange because I tested the elastic scaling with Sprout version 1.0-150717.1xxxxx and it just worked. Now I'm with 1.0-150928.173306. On the other side, etcd_cluster variable on local_config should containt the existing deployment IPs. Now, my first sprout was 10.4.0.156; I then added 3 more. My question: when adding the 5th Sprout, shall I include in the etcd_cluster all the already existing Sprouts, or setting the first original Sprout's IP would be sufficient? Thanks, Nicola De: Eleanor Merry [mailto:[email protected]] Enviado el: viernes, 09 de octubre de 2015 17:39 Para: Datatronics - Nicola Principe; [email protected]<mailto:[email protected]> Asunto: RE: [Clearwater] - Sprout_process does not exit, elastic scaling Hi Nicola, It looks like the existing Sprout's aren't picking up the changes from the new Sprout. The clearwater-cluster-manager process on the new Sprout is waiting for the clearwater-cluster-manager processes on the other Sprouts to acknowledge its existence before it kicks the new Sprout process to reload its cluster_settings file (which is why Sprout hasn't recognised that the cluster_settings file is available yet). Can you please send me the clearwater-cluster-manager logs from one of the old Sprouts (in /var/log/clearwater-cluster-manager/)? To remove a node from the deployment, you should follow the docs at http://clearwater.readthedocs.org/en/latest/Clearwater_Elastic_Scaling/index.html#if-you-did-a-manual-install or http://clearwater.readthedocs.org/en/latest/Handling_Failed_Nodes/index.html (to force remove the node). In this case though I think it's the existing nodes misbehaving though, so removing the new node won't help. Ellie From: Clearwater [mailto:[email protected]] On Behalf Of Nicola Principe Sent: 09 October 2015 13:20 To: [email protected]<mailto:[email protected]> Subject: Re: [Clearwater] - Sprout_process does not exit, elastic scaling Hi community, I have solved the "does not exist" issue (it was due to file permission), and then the Sprout cluster was in a not stable state. I have tried to decommission the new sprout, but clearwater-cluster-manager is not able to complete the query: UTC ERROR common_etcd_synchronizer.py:139 (thread ChronosPlugin): 10.4.0.130 caught EtcdException("Unable to decode server response: HTTPConnectionPool(host='10.4.0.130', port=4000): Read timed out.",) when trying to read with index 1478037 - pause before retry Is there a way to manually decommission a node from a deployment and leave the cluster stable? Thanks. De: Clearwater [mailto:[email protected]] En nombre de Nicola Principe Enviado el: viernes, 09 de octubre de 2015 12:38 Para: [email protected]<mailto:[email protected]> Asunto: [Clearwater] - Sprout_process does not exit, elastic scaling Hi community, I have a PCW deployment with 3 Sprouts and 2 Homesteads. I have tried to add a 4th Sprout following the automatic clustering scaling instructions, but it does not work. On one of my Sprout already in the deployment I see this (10.4.0.130 is the new Sprout): Describing the Sprout Memcached cluster in site site1: The local node is in this cluster The cluster is *not* stable 10.4.0.157 is in state normal 10.4.0.156 is in state normal 10.4.0.159 is in state normal 10.4.0.130 is in state joining, acknowledged change Describing the Sprout Chronos cluster in site site1: The local node is in this cluster The cluster is *not* stable 10.4.0.157 is in state normal 10.4.0.156 is in state normal 10.4.0.159 is in state normal 10.4.0.130 is in state joining, acknowledged change But on the new Sprout node the sprout_process does not exist: [sprout]manager@sprout-4:/var/log/sprout$ sudo monit status The Monit daemon 5.8.1 uptime: 7m Process 'sprout_process' status Does not exist monitoring status Monitored data collected Fri, 09 Oct 2015 12:23:25 Program 'poll_sprout_sip' status Initializing monitoring status Initializing data collected Fri, 09 Oct 2015 12:15:54 Program 'poll_sprout_http' status Initializing monitoring status Initializing data collected Fri, 09 Oct 2015 12:15:54 In the logs I can see the following: 09-10-2015 10:22:33.009 UTC Error memcached_config.cpp:133: Failed to open '/etc/clearwater/cluster_settings' 09-10-2015 10:22:33.009 UTC Error memcachedstore.cpp:184: Failed to read config, keeping previous settings 09-10-2015 10:22:33.010 UTC Error main.cpp:1885: Cluster settings file '/etc/clearwater/cluster_settings' does not contain a valid set of servers ...but the cluster_settings file has been generated by etcd automatically. Do you have any suggestion to sort this out? Thanks, Nicola
_______________________________________________ Clearwater mailing list [email protected] http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org
