Re: [Project Clearwater] Unable to contact the etcd cluster

Sunil Kumar Thu, 19 Apr 2018 04:58:48 -0700

Thanks for replying Adam,

all node start working but *clearwater_cluster_manager_process *does not
exist


[bono]ubuntu@bono:~$ sudo monit summary
[sudo] password for ubuntu:
Monit 5.18.1 uptime: 22h 7m
 Service Name                     Status                      Type
 node-bono                        Running                     System
 restund_process                  Running                     Process
 ntp_process                      Running                     Process
 clearwater_queue_manager_pro...  Running                     Process
 etcd_process                     Running                     Process
 clearwater_diags_monitor_pro...  Running                     Process
 clearwater_config_manager_pr...  Running                     Process
 *clearwater_cluster_manager_p...  Execution failed | Does...  Process*
 bono_process                     Running                     Process
 poll_restund                     Status ok                   Program
 monit_uptime                     Status ok                   Program
 clearwater_queue_manager_uptime  Status ok                   Program
 etcd_uptime                      Status ok                   Program
 poll_etcd_cluster                Status ok                   Program
 poll_etcd                        Status ok                   Program
 poll_bono                        Status ok                   Program


how can I run it it is showing execution failed | does not exit on every
node except vellum.

In parallel I am also installing the clearwater from scratch using static
IP (Using NAT + Host-only network). Please guide some solution regarding
that also if possible.

thansk,
sunil

On Thu, Apr 19, 2018 at 3:46 PM, Sunil Kumar <[email protected]> wrote:

> the lost node is not the master node but IP of master nodes is changed and
> I update them.
>
> thanks
>
> On Thu, Apr 19, 2018 at 3:23 PM, Sunil Kumar <[email protected]> wrote:
>
>> Hi Adam,
>>
>> Thanks a lot for replying.  I am using virtualbox for installing the VMs,
>> earlier I was using bridge adapter so it takes the network IP from dhcp, I
>> assign them as public and local_ip same.
>>
>> As you mention for static IP I tried with with *NAT + Host-only Network
>> (*NAT as primary interface eth0*), but all node having same IP in eth0
>> in NAT as 10.0.2.15 (is it fine to have all node same IP because I am not
>> using that IP)* and i assign host-only ip as static as 192.168.56.110
>> etc.
>>
>> 1) can I use both local_ip and public_ip same as host only ip
>> (192.168.56.110 etc) or public_ip would be the ip of host machine on which
>> virtualbox is installed as in NAT VM use Host IP as public IP to contact
>> outer world.
>>
>> 2) Is public_ip necessary as I only want stress testing to run in same
>> network, I don't want to install the no. on client like zoiper and all.
>>
>> 3) Is port forwarding isnecessary in  *NAT + Host-only Network, because * 
>> nodes
>> are able to communicate each other and in host only network so I don't
>> think port forwarding is necessary.
>>
>> 4) I just want to run stress testing for handling 1 lack call/sec. so how
>> many sprout, vellum node is needed for this much calls.
>>
>> Thanks,
>> Sunil
>>
>>
>> On Thu, Apr 19, 2018 at 2:42 PM, Adam Lindley <
>> [email protected]> wrote:
>>
>>> Hi Sunil,
>>>
>>>
>>>
>>> I’m afraid the steps you’ve taken are not supported in Project
>>> Clearwater deployments. Both changing the ‘local_ip’ of a node, and
>>> removing nodes just by deleting the VMs.
>>>
>>>
>>>
>>> On the first point, you need to be able to give your VMs permanent
>>> static IP addresses.
>>>
>>> On the second, by deleting the VMs in your cluster, your underlying etcd
>>> cluster has lost quorum. I would suggest http://clearwater.readthedocs.
>>> io/en/stable/Handling_Multiple_Failed_Nodes.htm as a starting point for
>>> recovering information from it. However, as your single remaining node will
>>> likely also have problems due to the local IP changing, you may simply want
>>> to redeploy from scratch.
>>>
>>>
>>>
>>> More in general, you seem to have hit a substantial number of issues in
>>> deploying Project Clearwater, which is both not what we want, and not what
>>> the experience of many other users seems to be. I would suggest taking a
>>> wider look over our provided documentation, and making sure your
>>> environment matches our expectations, and that you’re clear on our
>>> processes. This should make your next deployment a lot smoother.
>>>
>>>
>>>
>>> Cheers, and good luck,
>>>
>>> Adam
>>>
>>>
>>>
>>> *From:* Clearwater [mailto:clearwater-bounces@lis
>>> ts.projectclearwater.org] *On Behalf Of *Sunil Kumar
>>> *Sent:* 19 April 2018 07:16
>>> *To:* [email protected]
>>> *Subject:* Re: [Project Clearwater] Unable to contact the etcd cluster
>>>
>>>
>>>
>>> Hi,
>>>
>>> the node with ip 10.224.61.109, 10.224.61.112 etc is no more there, I
>>> have deleted the node directly. It looks like they are still in the etcd
>>> cluster. Can you please tell me how to remove them
>>>
>>>
>>>
>>> [IST Apr 19 19:32:45] error    : 'etcd_process' process is not running
>>>
>>> [IST Apr 19 19:32:45] info     : 'etcd_process' trying to restart
>>>
>>> [IST Apr 19 19:32:45] info     : 'etcd_process' restart: /bin/bash
>>>
>>> [IST Apr 19 19:33:15] error    : 'etcd_process' failed to restart (exit
>>> status -1) -- /bin/bash: Program timed out -- zmq_msg_recv: Resource
>>> temporarily unavailable
>>>
>>> cat: /var/run/clearwater-etcd/clearwater-etcd.pid: No such file or
>>> directory
>>>
>>> cat: /var/run/clearwater-etcd/clearwater-etcd.pid: No such file or
>>> directory
>>>
>>> context deadline excee
>>>
>>> [IST Apr 19 19:33:25] error    : 'etcd_process' process is not running
>>>
>>> [IST Apr 19 19:33:25] info     : 'etcd_process' trying to restart
>>>
>>> [IST Apr 19 19:33:25] info     : 'etcd_process' restart: /bin/bash
>>>
>>> [IST Apr 19 19:33:55] error    : 'etcd_process' failed to restart (exit
>>> status -1) -- /bin/bash: Program timed out -- zmq_msg_recv: Resource
>>> temporarily unavailable
>>>
>>> client: etcd cluster is unavailable or misconfigured; error #0: *dial
>>> tcp 10.224.61.109:4000 <http://10.224.61.109:4000>*: getsockopt: no
>>> route to host
>>>
>>> ; error #1: dial tcp 10.224.61.47:4000: getsockopt: co
>>>
>>> [IST Apr 19 19:34:05] error    : 'etcd_process' process is not running
>>>
>>> [IST Apr 19 19:34:05] info     : 'etcd_process' trying to restart
>>>
>>> [IST Apr 19 19:34:05] info     : 'etcd_process' restart: /bin/bash
>>>
>>> [IST Apr 19 19:34:36] error    : 'etcd_process' failed to restart (exit
>>> status 2) -- /bin/bash: zmq_msg_recv: Resource temporarily unavailable
>>>
>>> context deadline exceeded
>>>
>>>
>>>
>>>
>>>
>>> On Thu, Apr 19, 2018 at 11:03 AM, Sunil Kumar <[email protected]>
>>> wrote:
>>>
>>> Hi,
>>>
>>> Any body can help me on this. after ip lost, i update the ip in
>>> local_config and dns and restart the service. extra vm is deleted lik  i
>>> had 3 sprout node so 2 are deleted.
>>>
>>>
>>>
>>> [vellum]ubuntu@vellum:~$ cw-config upload shared_config
>>>
>>> Unable to contact the etcd cluster.
>>>
>>>
>>>
>>> thanks
>>>
>>> sunil
>>>
>>>
>>>
>>> _______________________________________________
>>> Clearwater mailing list
>>> [email protected]
>>> http://lists.projectclearwater.org/mailman/listinfo/clearwat
>>> er_lists.projectclearwater.org
>>>
>>>
>>
>

_______________________________________________
Clearwater mailing list
[email protected]
http://lists.projectclearwater.org/mailman/listinfo/clearwater_lists.projectclearwater.org

Re: [Project Clearwater] Unable to contact the etcd cluster

Reply via email to