Nope, nothing in logs suggests that node is fenced while in reboot. Moreover, same behaviour persists with pacemaker started - and I've explicitly put node into standby in pacemaker before reboot. And same behaviour persists with stonith-enabled=false; same behaviour with manual node fence via "stonith_admin --reboot node-1.spb.stone.local". So i suppose fencing isn't issue here.

Yuriy Demchenko

On 11/07/2013 05:11 PM, Vishesh kumar wrote:
My understanding is node fenced while rebooting. I suggest you to look info fencing logs as well. If your fencing logs not in detail use following in cluster.conf to enable logging

<logging>
              <logging_daemon name="fenced" debug="on"/>
   </logging>

Thanks


On Thu, Nov 7, 2013 at 5:34 PM, Yuriy Demchenko <demchenko...@gmail.com <mailto:demchenko...@gmail.com>> wrote:

    Hi,

    I'm trying to set up 3-node cluster (2 nodes + 1 standby node for
    quorum) with cman+pacemaker stack, everything according this
    quickstart article: http://clusterlabs.org/quickstart-redhat.html

    Cluster starts, all nodes see each other, quorum gained, stonith
    working, but I've run into problem with cman: node cant join
    cluster after reboot - cman starts and cman_tool nodes reports
    only that node as cluster-member, while on other 2 nodes it
    reports 2 nodes as cluster-member and 3rd as offline. cman
    stop/start/restart on the problem node does no effect - it still
    can see only itself, but if i'll do cman restart on one of working
    nodes - everything goes back to normal, all 3 nodes joins the
    cluster and subsequent cman service restarts on any nodes works
    fine - node lefts cluster and rejoins sucessfully. But again -
    only till node OS reboot.

    For example:
    [1] Working cluster:

        [root@node-1 ~]# cman_tool nodes
        Node  Sts   Inc   Joined               Name
           1   M    592   2013-11-07 15:20:54  node-1.spb.stone.local
           2   M    760   2013-11-07 15:20:54  node-2.spb.stone.local
           3   M    760   2013-11-07 15:20:54  vnode-3.spb.stone.local
        [root@node-1 ~]# cman_tool status
        Version: 6.2.0
        Config Version: 10
        Cluster Name: ocluster
        Cluster Id: 2059
        Cluster Member: Yes
        Cluster Generation: 760
        Membership state: Cluster-Member
        Nodes: 3
        Expected votes: 3
        Total votes: 3
        Node votes: 1
        Quorum: 2
        Active subsystems: 7
        Flags:
        Ports Bound: 0
        Node name: node-1.spb.stone.local
        Node ID: 1
        Multicast addresses: 239.192.8.19
        Node addresses: 192.168.220.21

    Picture is same on all 3 nodes (except for node name and id) -
    same cluster name, cluster id, multicast addres.

    [2] I've put node-1 into reboot. After reboot complete, "cman_tool
    nodes" on node-2 and vnode-3 shows this:

        Node  Sts   Inc   Joined               Name
           1   X    760  node-1.spb.stone.local
           2   M    588   2013-11-07 15:11:23  node-2.spb.stone.local
           3   M    760   2013-11-07 15:20:54  vnode-3.spb.stone.local
        [root@node-2 ~]# cman_tool status
        Version: 6.2.0
        Config Version: 10
        Cluster Name: ocluster
        Cluster Id: 2059
        Cluster Member: Yes
        Cluster Generation: 764
        Membership state: Cluster-Member
        Nodes: 2
        Expected votes: 3
        Total votes: 2
        Node votes: 1
        Quorum: 2
        Active subsystems: 7
        Flags:
        Ports Bound: 0
        Node name: node-2.spb.stone.local
        Node ID: 2
        Multicast addresses: 239.192.8.19
        Node addresses: 192.168.220.22

    But, on rebooted node-1 it shows this:

        Node  Sts   Inc   Joined               Name
           1   M    764   2013-11-07 15:49:01  node-1.spb.stone.local
           2   X      0  node-2.spb.stone.local
           3   X      0  vnode-3.spb.stone.local
        [root@node-1 ~]# cman_tool status
        Version: 6.2.0
        Config Version: 10
        Cluster Name: ocluster
        Cluster Id: 2059
        Cluster Member: Yes
        Cluster Generation: 776
        Membership state: Cluster-Member
        Nodes: 1
        Expected votes: 3
        Total votes: 1
        Node votes: 1
        Quorum: 2 Activity blocked
        Active subsystems: 7
        Flags:
        Ports Bound: 0
        Node name: node-1.spb.stone.local
        Node ID: 1
        Multicast addresses: 239.192.8.19
        Node addresses: 192.168.220.21

    so, same cluster name, cluster id, multicast address - but it cant
    see other nodes. And there are nothing in /var/log/messages and
    /var/log/cluster/corosync.log on other two nodes - they seem not
    notice node-1 coming back online at all, last records about node-1
    leaving cluster.

    [3] If now i do "service cman restart" on node-2 or vnode-3 -
    everything goes back to normal operation as in [1]
    in logs it shows as node-2 leaving cluster (service stop) and
    simultaneously joining of both node-2 and node-1 (service start)

        Nov  7 11:47:06 vnode-3 corosync[26692]: [QUORUM] Members[2]: 2 3
        Nov  7 11:47:06 vnode-3 corosync[26692]:   [TOTEM ] A
        processor joined or left the membership and a new membership
        was formed.
        Nov  7 11:47:06 vnode-3 kernel: dlm: closing connection to node 1
        Nov  7 11:47:06 vnode-3 corosync[26692]:   [CPG   ] chosen
        downlist: sender r(0) ip(192.168.220.22) ; members(old:3 left:1)
        Nov  7 11:47:06 vnode-3 corosync[26692]:   [MAIN  ] Completed
        service synchronization, ready to provide service.
        Nov  7 11:53:28 vnode-3 corosync[26692]:   [QUORUM] Members[1]: 3
        Nov  7 11:53:28 vnode-3 corosync[26692]:   [TOTEM ] A
        processor joined or left the membership and a new membership
        was formed.
        Nov  7 11:53:28 vnode-3 corosync[26692]:   [CPG   ] chosen
        downlist: sender r(0) ip(192.168.220.14) ; members(old:2 left:1)
        Nov  7 11:53:28 vnode-3 corosync[26692]:   [MAIN  ] Completed
        service synchronization, ready to provide service.
        Nov  7 11:53:28 vnode-3 kernel: dlm: closing connection to node 2
        Nov  7 11:53:30 vnode-3 corosync[26692]:   [TOTEM ] A
        processor joined or left the membership and a new membership
        was formed.
        Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
        Members[2]: 1 3
        Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
        Members[2]: 1 3
        Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
        Members[3]: 1 2 3
        Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
        Members[3]: 1 2 3
        Nov  7 11:53:30 vnode-3 corosync[26692]:   [QUORUM]
        Members[3]: 1 2 3
        Nov  7 11:53:30 vnode-3 corosync[26692]:   [CPG   ] chosen
        downlist: sender r(0) ip(192.168.220.21) ; members(old:1 left:0)
        Nov  7 11:53:30 vnode-3 corosync[26692]:   [MAIN  ] Completed
        service synchronization, ready to provide service.


    I've set up such cluster before in quite same configuration and
    never had any problems, but now I'm completely stuck.
    So, what is wrong with my cluster and how to fix it?

    OS Centos 6.4 with lastest updates, firewall disabled, selinux
    permissive, all 3 nodes inside same network. Multicast working -
    checked with omping.
    cman.x86_64                   3.0.12.1-49.el6_4.2 @centos6-updates
    corosync.x86_64               1.4.1-15.el6_4.1 @centos6-updates
    pacemaker.x86_64              1.1.10-1.el6_4.4 @centos6-updates

    cluster.conf is in attach

-- Yuriy Demchenko


    --
    Linux-cluster mailing list
    Linux-cluster@redhat.com <mailto:Linux-cluster@redhat.com>
    https://www.redhat.com/mailman/listinfo/linux-cluster




--
http://linuxmantra.com



-- 
Linux-cluster mailing list
Linux-cluster@redhat.com
https://www.redhat.com/mailman/listinfo/linux-cluster

Reply via email to