Re: [Project Clearwater] etcd cluster is unavailable or misconfigured

Adam Lindley Mon, 16 Oct 2017 09:14:55 -0700

Hi Devendra,

It looks like your etcd cluster is in a bad state. The fact that the error 
output states “Joining an existing cluster, not joining an unhealthy cluster” 
suggests this node was once part of a healthy cluster, but since that point the 
cluster has lost quorum. Are you able to provide some more information about 
what led to the deployment being in this state?
It is possible that the etcd data became corrupt at some point due to a node 
failure, which is why the Ellis node is now attempting to rejoin the cluster. 
However, as the cluster as a whole entity is unhealthy it is unable to rejoin.


I would suggest that you:

·         Ensure all nodes in the deployment are able to contact each other at 
the IPs listed in the ‘etcd_cluster’ parameter, and that they have ports 2380 
and 4000 open to traffic.

·         Check to see if other nodes are also in an unhealthy state

o   If multiple nodes have entered a bad state, you will need to re-create the 
etcd cluster from scratch. To do this, you can follow the process at 
https://clearwater.readthedocs.io/en/stable/Handling_Multiple_Failed_Nodes.html

o   If only your Ellis node is unable to rejoin the cluster, it is likely an 
issue in traffic from the Ellis node being unable to reach the other members of 
the cluster

·         Check that the local_config on each node is correct, and that the IPs 
in the ‘etcd_cluster’ parameter are set correctly to the IPs of all nodes in 
your cluster

Hopefully this can get your deployment up and running. Let us know how you get 
on.
If you aren’t able to get it up and running with the above, try taking a look 
in the logs under /var/log/clearwater-etcd/ to see if you can find anything to 
help guide you.

Cheers,
Adam

From: Clearwater [mailto:[email protected]] On 
Behalf Of Devendra Singh
Sent: 16 October 2017 11:05
To: [email protected]
Subject: [Project Clearwater] etcd cluster is unavailable or misconfigured

Hi,

I am getting below error in manual installation 
(Bono,Ellis,Vellum,Homer,Dime,Sprout) on six machine

[ellis]ist@ellis:~$ sudo monit summary
Monit 5.18.1 uptime: 54m
 Service Name                     Status                      Type
 node-ellis                       Running                     System
 ntp_process                      Running                     Process
 nginx_process                    Running                     Process
 mysql_process                    Running                     Process
 ellis_process                    Running                     Process
 clearwater_queue_manager_pro...  Running                     Process
 etcd_process                     Execution failed | Does...  Process
 clearwater_diags_monitor_pro...  Running                     Process
 clearwater_config_manager_pr...  Running                     Process
 clearwater_cluster_manager_p...  Running                     Process
 nginx_ping                       Status ok                   Program
 nginx_uptime                     Status ok                   Program
 monit_uptime                     Status ok                   Program
 poll_ellis                       Status ok                   Program
 poll_ellis_https                 Status ok                   Program
 clearwater_queue_manager_uptime  Status ok                   Program
 etcd_uptime                      Wait parent                 Program
 poll_etcd_cluster                Wait parent                 Program
 poll_etcd                        Wait parent                 Program
[ellis]ist@ellis:~$

[ellis]ist@ellis:~$ sudo service clearwater-etcd start
Error:  client: etcd cluster is unavailable or misconfigured; error #0: dial 
tcp 127.0.0.1:2379<http://127.0.0.1:2379>: getsockopt: connection refused
; error #1: dial tcp 127.0.0.1:4001<http://127.0.0.1:4001>: getsockopt: 
connection refused

error #0: dial tcp 127.0.0.1:2379<http://127.0.0.1:2379>: getsockopt: 
connection refused
error #1: dial tcp 127.0.0.1:4001<http://127.0.0.1:4001>: getsockopt: 
connection refused

Joining existing cluster...
Not joining an unhealthy cluster

------------------------------------------------------------------
local_config
---------------------------------------------------------------------
#Local IP configuration
local_ip=172.16.1.23
public_ip=172.16.1.23
public_hostname=ellis
etcd_cluster="172.16.1.23,172.16.2.133,172.16.4.195,172.16.5.22,172.16.4.37,172.16.1.142"

shared_config
---------------------------------------------------------------------

# Deployment definitions
home_domain=example.com<http://example.com>
sprout_hostname=sprout.example.com<http://sprout.example.com>
chronos_hostname=vellum.example.com:7253<http://vellum.example.com:7253>
hs_hostname=hs.example.com:8888<http://hs.example.com:8888>
hs_provisioning_hostname=hs.example.com:8889<http://hs.example.com:8889>
sprout_impi_store=vellum.example.com<http://vellum.example.com>
cassandra_hostname=vellum.example.com<http://vellum.example.com>
xdms_hostname=homer.example.com:7888<http://homer.example.com:7888>
dime_session_store=vellum.example.com<http://vellum.example.com>
upstream_port=0

# Email server configuration
smtp_smarthost=172.16.1.23
smtp_username=username
smtp_password=password
[email protected]<mailto:[email protected]>

# Keys (you can change this secret to something else)
signup_key=secret
turn_workaround=secret
ellis_api_key=secret
ellis_cookie_key=secret

Please let me know if anything wrong i have configured .

Thanks and Regards,
Devendra

_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org

Re: [Project Clearwater] etcd cluster is unavailable or misconfigured

Reply via email to