Re: [ovs-dev] [PATCH v3 1/4] ovn: ovn-ctl support for HA ovn DB servers

Andy Zhou Thu, 13 Oct 2016 15:32:11 -0700

On Wed, Oct 12, 2016 at 10:57 PM, Babu Shanmugam <bscha...@redhat.com>
wrote:


>
>
> On Thursday 13 October 2016 07:26 AM, Andy Zhou wrote:
>
>
>
> On Sun, Oct 9, 2016 at 12:02 AM, Babu Shanmugam <bscha...@redhat.com>
> wrote:
>
>>
>>
>> On Friday 07 October 2016 05:33 AM, Andy Zhou wrote:
>>
>>> Babu,  Thank you for working on this.  At a high level, it is not clear
>>> to me the boundary between ocf scripts and the ovn-ctl script -- i.e. which
>>> aspect is managed by which entity.  For example, 1) which scripts are
>>> responsible for starting the ovsdb servers.
>>>
>> ovsdb servers are started by the pacemaker. It uses the OCF script and
>> the OCF script uses ovn-ctl.
>>
>> 2) Which script should manage the fail-over -- I tried to shut down a
>>> cluster node using the "pcs" command, and fail-over did not happen.
>>>
>> The OCF script for OVN DB servers is capable of understanding the promote
>> and demote calls. So, pacemaker will use this script to run ovsdb server in
>> all the nodes and promote one node as the master(active server). If the
>> node in which the master instance is running fails, pacemaker automatically
>> promotes another node as the master. OCF script is an agent for the
>> pacemaker for the OVN db resource.
>> The above behavior depends on the way you are configuring the resource
>> that uses this OCF script. I am attaching a simple set of commands to
>> configure the ovsdb server. You can create the resources after creating the
>> cluster with the following command
>>
>> crm configure < ovndb.pcmk
>>
>> Please note, you have to replace the macros VM1_NAME, VM2_NAME, VM3_NAME
>> and MASTER_IP with the respective values before using ovndb.pcmk. This
>> script works with a 3 node cluster. I am assuming the node ids as 101, 102,
>> and 103. Please replace them as well to work with your cluster.
>>
>>
>> --
>> Babu
>>
>
> Unfortunately, CRM is not distributed with pacemaker on centos anymore.
> It took me some time to get it installed.  I think other may ran into
> similar issues, so
> it may be worth while do document this, or change the script to use "pcs"
> which is part of the distribution.
>
>
> I agree. Is INSTALL*.md good enough? In openstack, we are managing the
> resource through puppet manifests.
>

O.K.

>
>
>
> I adapted the script with my setup.  I have two nodes, "h1"(10.33.74.77)
> and "h2"(10.33.75.158), For Master_IP, I used 10.33.75.220.
>
> This is the output of crm configure show:
>
> ------
>
>  [root@h2 azhou]# crm configure show
>
> node 1: h1 \
>
> attributes
>
> node 2: h2
>
> primitive ClusterIP IPaddr2 \
>
> params ip=10.33.75.200 cidr_netmask=32 \
>
> op start interval=0s timeout=20s \
>
> op stop interval=0s timeout=20s \
>
> op monitor interval=30s
>
> primitive WebSite apache \
>
> params configfile="/etc/httpd/conf/httpd.conf" statusurl="
> http://127.0.0.1/server-status"; \
>
> op start interval=0s timeout=40s \
>
> op stop interval=0s timeout=60s \
>
> op monitor interval=1min \
>
> meta
>
> primitive ovndb ocf:ovn:ovndb-servers \
>
> op start interval=0s timeout=30s \
>
> op stop interval=0s timeout=20s \
>
> op promote interval=0s timeout=50s \
>
> op demote interval=0s timeout=50s \
>
> op monitor interval=1min \
>
> meta
>
> colocation colocation-WebSite-ClusterIP-INFINITY inf: WebSite ClusterIP
>
> order order-ClusterIP-WebSite-mandatory ClusterIP:start WebSite:start
>
> property cib-bootstrap-options: \
>
> have-watchdog=false \
>
> dc-version=1.1.13-10.el7_2.4-44eb2dd \
>
> cluster-infrastructure=corosync \
>
> cluster-name=mycluster \
>
> stonith-enabled=false
>
>
>
> You seem to have configured ovndb just as a primitive resource and not as
> a master slave resource. And there is no colocation resource configured for
> the ovndb with ClusterIP. Only with the colocation resource, ovndb server
> will be co-located with the ClusterIP resource.  You will have to include
> the following lines for crm configure. You can configure the same with pcs
> as well.
>
> ms ovndb-master ovndb meta notify="true"
> colocation colocation-ovndb-master-ClusterIP-INFINITY inf:
> ovndb-master:Started ClusterIP:Master
> order order-ClusterIP-ovndb-master-mandatory inf: ClusterIP:start
> ovndb-master:start
>
> Done. Now it shows the following.

[root@h2 ovs]# crm configure show
>
> node 1: h1 \
>
> attributes
>
> node 2: h2
>
> primitive ClusterIP IPaddr2 \
>
> params ip=10.33.75.200 cidr_netmask=32 \
>
> op start interval=0s timeout=20s \
>
> op stop interval=0s timeout=20s \
>
> op monitor interval=30s
>
> primitive ovndb ocf:ovn:ovndb-servers \
>
> op start interval=0s timeout=30s \
>
> op stop interval=0s timeout=20s \
>
> op promote interval=0s timeout=50s \
>
> op demote interval=0s timeout=50s \
>
> op monitor interval=1min \
>
> meta
>
> ms ovndb-master ovndb \
>
> meta notify=true
>
> colocation colocation-ovndb-master-ClusterIP-INFINITY inf: 
> ovndb-master:Started
> ClusterIP:Master
>
> order order-ClusterIP-ovndb-master-mandatory inf: ClusterIP:start
> ovndb-master:start
>
> property cib-bootstrap-options: \
>
> have-watchdog=false \
>
> dc-version=1.1.13-10.el7_2.4-44eb2dd \
>
> cluster-infrastructure=corosync \
>
> cluster-name=mycluster \
>
> stonith-enabled=false
>
> property ovn_ovsdb_master_server: \
>
> OVN_REPL_INFO=h1
>
>

> --------
>
> I have also added firewall rules to allow access to TCP port 6642 and port
> 6641.
>
>
> At this stage, crm_mon shows:
>
> Last updated: Wed Oct 12 14:49:07 2016          Last change: Wed Oct 12
> 13:58:55
>
>  2016 by root via crm_attribute on h2
>
> Stack: corosync
>
> Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum
>
> 2 nodes and 3 resources configured
>
>
> Online: [ h1 h2 ]
>
>
> ClusterIP (ocf::heartbeat:IPaddr2): Started h2
>
> WebSite (ocf::heartbeat:apache):        Started h2
>
> ovndb   (ocf::ovn:ovndb-servers): Started h1
>
>
> Failed Actions:
>
> * ovndb_start_0 on h2 'unknown error' (1): call=39, status=Timed Out,
> exitreason
>
> ='none',
>
>     last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003ms
>
>
> ---
>
> Not sure what the error message on h2 is about, Notice ovndb service is
> now running on h1, while the cluster IP is on h2.
>
>
> Looks like, the OCF script is not able to start the ovsdb servers in 'h2'
> node (we are getting a timed-out status). You can check if the OCF script
> is working good by using ocf-tester. You can run the ocf-tester using
>
> ocf-tester -n test-ovndb -o master_ip 10.0.0.1 <path-to-the-ocf-script>
>

My installation does not have ocf-tester,  There is a program called ocft
with a test option. Not sure if this is a suitable replacement. If not, how
could I get
the ocf-tester program? I ran the ocft program and get the following
output. Not sure what it means.

 [root@h2 ovs]# ocft test -n test-ovndb -o master_ip 10.0.0.1
/usr/share/openvswitch/scripts/ovndb-servers.ocf

ERROR: cases directory not found.




Alternately, you can check if the ovsdb servers are started properly by
> running
>
> /usr/share/openvswitch/scripts/ovn-ctl --db-sb-sync-from=10.0.0.1
> --db-nb-sync-from=10.0.0.1 start_ovsdb
>
>
> The output are as follows. Should we use --db-sb-sync-from-addr instead?
 [root@h2 ovs]# /usr/share/openvswitch/scripts/ovn-ctl
--db-sb-sync-from=10.0.0.1 --db-nb-sync-from=10.0.0.1 start_ovsdb

/usr/share/openvswitch/scripts/ovn-ctl: unknown option
"--db-sb-sync-from=10.0.0.1" (use --help for help)
/usr/share/openvswitch/scripts/ovn-ctl: unknown option
"--db-nb-sync-from=10.0.0.1" (use --help for help)

'ovn-ctl' runs without any error message after I fixed the command line
parameter.

> Also, both server are running as a backup server:
>
> [root@h1 azhou]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl
> ovsdb-server/sync-status
>
> state: backup
>
> connecting: tcp:192.0.2.254:6642   // I specified the IP at
> /etc/openvswitch/ovnsb-active.conf, But the file was over-written with
> 192.0.2.254
>
>
> [root@h2 ovs]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl
> ovsdb-server/sync-status
>
> state: backup
>
> replicating: tcp:10.33.74.77:6642   // The IP address was retained on h2
>
> database: OVN_Southbound
>
> ---
>
> Any suggestions on what I did wrong?
>
>
>
> I think this is mostly due to the crm configuration. Once you add the 'ms'
> and 'colocation' resources, you should be able to overcome this problem.
>
> No, ovndb still failed to launch on h2.

[root@h2 ovs]# crm status

Last updated: Thu Oct 13 11:27:42 2016 Last change: Thu Oct 13 11:17:25
2016 by root via cibadmin on h2

Stack: corosync

Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum

2 nodes and 3 resources configured


*Online*: [ h1 h2 ]


Full list of resources:


 *ClusterIP* (ocf::heartbeat:IPaddr2): *Started* h1

 *Master*/*Slave* Set: ovndb-*master* [ovndb]

     *Master*s: [ h1 ]

     *Stopped*: [ h2 ]


Failed Actions:

* ovndb_start_0 on h2 '*unknown error*' (1): call=39, status=*Timed Out*,
exitreason='none',
    last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003

I have never tried colocating two resources with the ClusterIP resource.
> Just for testing, is it possible to drop the WebServer resource?
>
> Done.  It did not make any difference that I can see.
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Re: [ovs-dev] [PATCH v3 1/4] ovn: ovn-ctl support for HA ovn DB servers

Reply via email to