On Thursday 13 October 2016 07:26 AM, Andy Zhou wrote:

On Sun, Oct 9, 2016 at 12:02 AM, Babu Shanmugam <bscha...@redhat.com <mailto:bscha...@redhat.com>> wrote:

    On Friday 07 October 2016 05:33 AM, Andy Zhou wrote:

        Babu, Thank you for working on this.  At a high level, it is
        not clear to me the boundary between ocf scripts and the
        ovn-ctl script -- i.e. which aspect is managed by which
        entity.  For example, 1) which scripts are responsible for
        starting the ovsdb servers.

    ovsdb servers are started by the pacemaker. It uses the OCF script
    and the OCF script uses ovn-ctl.

        2) Which script should manage the fail-over -- I tried to shut
        down a cluster node using the "pcs" command, and fail-over did
        not happen.

    The OCF script for OVN DB servers is capable of understanding the
    promote and demote calls. So, pacemaker will use this script to
    run ovsdb server in all the nodes and promote one node as the
    master(active server). If the node in which the master instance is
    running fails, pacemaker automatically promotes another node as
    the master. OCF script is an agent for the pacemaker for the OVN
    db resource.
    The above behavior depends on the way you are configuring the
    resource that uses this OCF script. I am attaching a simple set of
    commands to configure the ovsdb server. You can create the
    resources after creating the cluster with the following command

    crm configure < ovndb.pcmk

    Please note, you have to replace the macros VM1_NAME, VM2_NAME,
    VM3_NAME and MASTER_IP with the respective values before using
    ovndb.pcmk. This script works with a 3 node cluster. I am assuming
    the node ids as 101, 102, and 103. Please replace them as well to
    work with your cluster.


Unfortunately, CRM is not distributed with pacemaker on centos anymore. It took me some time to get it installed. I think other may ran into similar issues, so it may be worth while do document this, or change the script to use "pcs" which is part of the distribution.

I agree. Is INSTALL*.md good enough? In openstack, we are managing the resource through puppet manifests.

I adapted the script with my setup. I have two nodes, "h1"( and "h2"(, For Master_IP, I used

This is the output of crm configure show:


 [root@h2 azhou]# crm configure show

node1: h1 \


node2: h2

primitiveClusterIP IPaddr2 \


opstart interval=0stimeout=20s\

opstop interval=0stimeout=20s\

opmonitor interval=30s

primitiveWebSite apache \


opstart interval=0stimeout=40s\

opstop interval=0stimeout=60s\

opmonitor interval=1min\


primitiveovndb ocf:ovn:ovndb-servers \

opstart interval=0stimeout=30s\

opstop interval=0stimeout=20s\

oppromote interval=0stimeout=50s\

opdemote interval=0stimeout=50s\

opmonitor interval=1min\


colocationcolocation-WebSite-ClusterIP-INFINITY inf: WebSiteClusterIP

orderorder-ClusterIP-WebSite-mandatory ClusterIP:start WebSite:start

propertycib-bootstrap-options: \






You seem to have configured ovndb just as a primitive resource and not as a master slave resource. And there is no colocation resource configured for the ovndb with ClusterIP. Only with the colocation resource, ovndb server will be co-located with the ClusterIP resource. You will have to include the following lines for crm configure. You can configure the same with pcs as well.

ms ovndb-master ovndb meta notify="true"
colocation colocation-ovndb-master-ClusterIP-INFINITY inf: ovndb-master:Started ClusterIP:Master order order-ClusterIP-ovndb-master-mandatory inf: ClusterIP:start ovndb-master:start


I have also added firewall rules to allow access to TCP port 6642 and port 6641.

At this stage, crm_mon shows:

Last updated: Wed Oct 12 14:49:07 2016 Last change: Wed Oct 12 13:58:55

 2016 by root via crm_attributeon h2

Stack: corosync

Current DC: h2 (version 1.1.13-10.el7_2.4-44eb2dd) - partition with quorum

2 nodes and 3 resources configured

Online: [ h1 h2 ]

ClusterIP(ocf::heartbeat:IPaddr2):Started h2

WebSite (ocf::heartbeat:apache):        Started h2

ovndb (ocf::ovn:ovndb-servers):Started h1

Failed Actions:

* ovndb_start_0 on h2 'unknown error' (1): call=39, status=Timed Out, exitreason


last-rc-change='Wed Oct 12 14:43:03 2016', queued=0ms, exec=30003ms


Not sure what the error message on h2 is about, Notice ovndb service is now running on h1, while the cluster IP is on h2.

Looks like, the OCF script is not able to start the ovsdb servers in 'h2' node (we are getting a timed-out status). You can check if the OCF script is working good by using ocf-tester. You can run the ocf-tester using

ocf-tester -n test-ovndb -o master_ip <path-to-the-ocf-script>

Alternately, you can check if the ovsdb servers are started properly by running

/usr/share/openvswitch/scripts/ovn-ctl --db-sb-sync-from= --db-nb-sync-from= start_ovsdb

Also, both server are running as a backup server:

[root@h1 azhou]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status

state: backup

connecting: tcp: <> // I specified the IP at /etc/openvswitch/ovnsb-active.conf, But the file was over-written with

[root@h2 ovs]# ovs-appctl -t /run/openvswitch/ovnsb_db.ctl ovsdb-server/sync-status

state: backup

replicating: tcp: <> // The IP address was retained on h2

database: OVN_Southbound


Any suggestions on what I did wrong?

I think this is mostly due to the crm configuration. Once you add the 'ms' and 'colocation' resources, you should be able to overcome this problem.

I have never tried colocating two resources with the ClusterIP resource. Just for testing, is it possible to drop the WebServer resource?

Thank you,

dev mailing list

Reply via email to