On Wed, Oct 12, 2016 at 10:57 PM, Babu Shanmugam <bscha...@redhat.com> wrote:
> > > On Thursday 13 October 2016 07:26 AM, Andy Zhou wrote: > > > > On Sun, Oct 9, 2016 at 12:02 AM, Babu Shanmugam <bscha...@redhat.com> > wrote: > >> >> >> On Friday 07 October 2016 05:33 AM, Andy Zhou wrote: >> >>> Babu, Thank you for working on this. At a high level, it is not clear >>> to me the boundary between ocf scripts and the ovn-ctl script -- i.e. which >>> aspect is managed by which entity. For example, 1) which scripts are >>> responsible for starting the ovsdb servers. >>> >> ovsdb servers are started by the pacemaker. It uses the OCF script and >> the OCF script uses ovn-ctl. >> >> 2) Which script should manage the fail-over -- I tried to shut down a >>> cluster node using the "pcs" command, and fail-over did not happen. >>> >> The OCF script for OVN DB servers is capable of understanding the promote >> and demote calls. So, pacemaker will use this script to run ovsdb server in >> all the nodes and promote one node as the master(active server). If the >> node in which the master instance is running fails, pacemaker automatically >> promotes another node as the master. OCF script is an agent for the >> pacemaker for the OVN db resource. >> The above behavior depends on the way you are configuring the resource >> that uses this OCF script. I am attaching a simple set of commands to >> configure the ovsdb server. You can create the resources after creating the >> cluster with the following command >> >> crm configure < ovndb.pcmk >> >> Please note, you have to replace the macros VM1_NAME, VM2_NAME, VM3_NAME >> and MASTER_IP with the respective values before using ovndb.pcmk. This >> script works with a 3 node cluster. I am assuming the node ids as 101, 102, >> and 103. Please replace them as well to work with your cluster. >> >> >> -- >> Babu >> > > Unfortunately, CRM is not distributed with pacemaker on centos anymore. > It took me some time to get it installed. I think other may ran into > similar issues, so > it may be worth while do document this, or change the script to use "pcs" > which is part of the distribution. > > > I agree. Is INSTALL*.md good enough? In openstack, we are managing the > resource through puppet manifests. > O.K. > > > > I adapted the script with my setup. I have two nodes, "h1"(10.33.74.77) > and "h2"(10.33.75.158), For Master_IP, I used 10.33.75.220. > > This is the output of crm configure show: > > ------ > > [root@h2 azhou]# crm configure show > > node 1: h1 \ > > attributes > > node 2: h2 > > primitive ClusterIP IPaddr2 \ > > params ip=10.33.75.200 cidr_netmask=32 \ > > op start interval=0s timeout=20s \ > > op stop interval=0s timeout=20s \ > > op monitor interval=30s > > primitive WebSite apache \ > > params configfile="/etc/httpd/conf/httpd.conf" statusurl=" > http://127.0.0.1/server-status" \ > > op start interval=0s timeout=40s \ > > op stop interval=0s timeout=60s \ > > op monitor interval=1min \ > > meta > > primitive ovndb ocf:ovn:ovndb-servers \ > > op start interval=0s timeout=30s \ > > op stop interval=0s timeout=20s \ > > op promote interval=0s timeout=50s \ > > op demote interval=0s timeout=50s \ > > op monitor interval=1min \ > > meta > > colocation colocation-WebSite-ClusterIP-INFINITY inf: WebSite ClusterIP > > order order-ClusterIP-WebSite-mandatory ClusterIP:start WebSite:start > > property cib-bootstrap-options: \ > > have-watchdog=false \ > > dc-version=1.1.13-10.el7_2.4-44eb2dd \ > > cluster-infrastructure=corosync \ > > cluster-name=mycluster \ > > stonith-enabled=false > > > > You seem to have configured ovndb just as a primitive resource and not as > a master slave resource. And there is no colocation resource configured for > the ovndb with ClusterIP. Only with the colocation resource, ovndb server > will be co-located with the ClusterIP resource. You will have to include > the following lines for crm configure. You can configure the same with pcs > as well. > > ms ovndb-master ovndb meta notify="true" > colocation colocation-ovndb-master-ClusterIP-INFINITY inf: > ovndb-master:Started ClusterIP:Master > order order-ClusterIP-ovndb-master-mandatory inf: ClusterIP:start > ovndb-master:start > > Done. Now it shows the following. [root@h2 ovs]# crm configure show > > node 1: h1 \ > > attributes > > node 2: h2 > > primitive ClusterIP IPaddr2 \ > > params ip=10.33.75.200 cidr_netmask=32 \ > > op start interval=0s timeout=20s \ > > op stop interval=0s timeout=20s \ > > op monitor interval=30s > > primitive ovndb ocf:ovn:ovndb-servers \ > > op start interval=0s timeout=30s \ > > op stop interval=0s timeout=20s \ > > op promote interval=0s timeout=50s \ > > op demote interval=0s timeout=50s \ > > op monitor interval=1min \ > > meta > > ms ovndb-master ovndb \ > > meta notify=true > > colocation colocation-ovndb-master-ClusterIP-INFINITY inf: > ovndb-master:Started > ClusterIP:Master > > order order-ClusterIP-ovndb-master-mandatory inf: ClusterIP:start > ovndb-master:start > > property cib-bootstrap-options: \ > > have-watchdog=false \ > > dc-version=1.1.13-10.el7_2.4-44eb2dd \ > > cluster-infrastructure=corosync \ > > cluster-name=mycluster \ > > stonith-enabled=false > > property ovn_ovsdb_master_server: \ > > OVN_REPL_INFO=h1 > > > -------- > > I have also added firewall rules to allow access to TCP port 6642 and port > 6641. > > > At this stage, crm_mon shows: > > Last updated: Wed Oct 12 14:49:07 2016 Last change: Wed Oct 12 > 13:58:55 > > 2016 by root via crm_attribute on h2 > > Stack: corosync > > Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum > > 2 nodes and 3 resources configured > > > Online: [ h1 h2 ] > > > ClusterIP (ocf::heartbeat:IPaddr2): Started h2 > > WebSite (ocf::heartbeat:apache): Started h2 > > ovndb (ocf::ovn:ovndb-servers): Started h1 > > > Failed Actions: > > * ovndb_start_0 on h2 'unknown error' (1): call=39, status=Timed Out, > exitreason > > ='none', > > last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003ms > > > --- > > Not sure what the error message on h2 is about, Notice ovndb service is > now running on h1, while the cluster IP is on h2. > > > Looks like, the OCF script is not able to start the ovsdb servers in 'h2' > node (we are getting a timed-out status). You can check if the OCF script > is working good by using ocf-tester. You can run the ocf-tester using > > ocf-tester -n test-ovndb -o master_ip 10.0.0.1 <path-to-the-ocf-script> > My installation does not have ocf-tester, There is a program called ocft with a test option. Not sure if this is a suitable replacement. If not, how could I get the ocf-tester program? I ran the ocft program and get the following output. Not sure what it means. [root@h2 ovs]# ocft test -n test-ovndb -o master_ip 10.0.0.1 /usr/share/openvswitch/scripts/ovndb-servers.ocf ERROR: cases directory not found. Alternately, you can check if the ovsdb servers are started properly by > running > > /usr/share/openvswitch/scripts/ovn-ctl --db-sb-sync-from=10.0.0.1 > --db-nb-sync-from=10.0.0.1 start_ovsdb > > > The output are as follows. Should we use --db-sb-sync-from-addr instead? [root@h2 ovs]# /usr/share/openvswitch/scripts/ovn-ctl --db-sb-sync-from=10.0.0.1 --db-nb-sync-from=10.0.0.1 start_ovsdb /usr/share/openvswitch/scripts/ovn-ctl: unknown option "--db-sb-sync-from=10.0.0.1" (use --help for help) /usr/share/openvswitch/scripts/ovn-ctl: unknown option "--db-nb-sync-from=10.0.0.1" (use --help for help) 'ovn-ctl' runs without any error message after I fixed the command line parameter. > Also, both server are running as a backup server: > > [root@h1 azhou]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl > ovsdb-server/sync-status > > state: backup > > connecting: tcp:192.0.2.254:6642 // I specified the IP at > /etc/openvswitch/ovnsb-active.conf, But the file was over-written with > 192.0.2.254 > > > [root@h2 ovs]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl > ovsdb-server/sync-status > > state: backup > > replicating: tcp:10.33.74.77:6642 // The IP address was retained on h2 > > database: OVN_Southbound > > --- > > Any suggestions on what I did wrong? > > > > I think this is mostly due to the crm configuration. Once you add the 'ms' > and 'colocation' resources, you should be able to overcome this problem. > > No, ovndb still failed to launch on h2. [root@h2 ovs]# crm status Last updated: Thu Oct 13 11:27:42 2016 Last change: Thu Oct 13 11:17:25 2016 by root via cibadmin on h2 Stack: corosync Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum 2 nodes and 3 resources configured *Online*: [ h1 h2 ] Full list of resources: *ClusterIP* (ocf::heartbeat:IPaddr2): *Started* h1 *Master*/*Slave* Set: ovndb-*master* [ovndb] *Master*s: [ h1 ] *Stopped*: [ h2 ] Failed Actions: * ovndb_start_0 on h2 '*unknown error*' (1): call=39, status=*Timed Out*, exitreason='none', last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003 I have never tried colocating two resources with the ClusterIP resource. > Just for testing, is it possible to drop the WebServer resource? > > Done. It did not make any difference that I can see. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev