On 2013-03-14 15:52, Fredrik Hudner wrote: > I set no-quorum-policy to ignore and removed the constraint you mentioned. > It then managed to failover once to the slave node, but I still have those. > > Failed actions: > > p_exportfs_root:0_monitor_ >> >> 30000 (node=testclu01, call=12, rc=7, >> status=complete): not running >> >> p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7, >> status=complete): not running
This only tells you that monitoring of these resources found them once not running .... logs should tell you what & when that happens > > I then stoped the new maste-node to see if it fell over to the other node > with no success.. It remains slave. Hard to say without seeing current cluster state like a "crm_mon -1frA", "cat /proc/drbd" and some logs ... not enough information ... > I also noticed that the constraint drbd-fence-by-handler-nfs-ms_drbd_nfs > was back in the crm configure. Seems like cib makes a replace This constraint is added by the DRBD primary if it looses connection to its peer and is perfectly fine if you stopped one node. > Mar 14 15:06:18 [1786] tdtestclu02 crmd: info: > abort_transition_graph: te_update_diff:126 - Triggered transition > abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.781.1) : Non-status > change > Mar 14 15:06:18 [1786] tdtestclu02 crmd: notice: > do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ > input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] > Mar 14 15:06:18 [1781] tdtestclu02 cib: info: > cib_replace_notify: Replaced: 0.780.39 -> 0.781.1 from tdtestclu01 > > So not sure how to remove that constraint on a permanent basis.. it's gone > as long as I don't stop pacemaker. Once the DRBD resync is finished it will be removed from the cluster configuration again automatically... you typically never need to remove such drbd-fence-constraints manually only in some rare failure scenarios. Regards, Andreas > > But it used to work booth with the no-quorom-policy=freeze and that > constraint > > Kind regards > /Fredrik > > > > On Thu, Mar 14, 2013 at 2:49 PM, Andreas Kurz <[email protected]> wrote: > >> On 2013-03-14 13:30, Fredrik Hudner wrote: >>> Hi all, >>> >>> I have a problem after I removed a node with the force command from my >> crm >>> config. >>> >>> Originally I had 2 nodes running HA cluster (corosync 1.4.1-7.el6, >>> pacemaker 1.1.7-6.el6) >>> >>> >>> >>> Then I wanted to add a third node acting as quorum node, but was not able >>> to get it to work (probably because I don’t understand how to set it up). >>> >>> So I removed the 3rd node, but had to use the force command as crm >>> complained when I tried to remove it. >>> >>> >>> >>> Now when I start up Pacemaker the resources doesn’t look like they come >> up >>> correctly >>> >>> >>> >>> Online: [ testclu01 testclu02 ] >>> >>> >>> >>> Master/Slave Set: ms_drbd_nfs [p_drbd_nfs] >>> >>> Masters: [ testclu01 ] >>> >>> Slaves: [ testclu02 ] >>> >>> Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver] >>> >>> Started: [ tdtestclu01 tdtestclu02 ] >>> >>> Resource Group: g_nfs >>> >>> p_lvm_nfs (ocf::heartbeat:LVM): Started testclu01 >>> >>> p_fs_shared (ocf::heartbeat:Filesystem): Started testclu01 >>> >>> p_fs_shared2 (ocf::heartbeat:Filesystem): Started testclu01 >>> >>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started testclu01 >>> >>> Clone Set: cl_exportfs_root [p_exportfs_root] >>> >>> Started: [ testclu01 testclu02 ] >>> >>> >>> >>> Failed actions: >>> >>> p_exportfs_root:0_monitor_30000 (node=testclu01, call=12, rc=7, >>> status=complete): not running >>> >>> p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7, >>> status=complete): not running >>> >>> >>> >>> The filesystems mount correctly on the master at this stage and can be >>> written to. >>> >>> When I stop the services on the master node for it to failover, it >> doesn’t >>> work.. Looses cluster-ip connectivity >> >> fix your "no-quorum-policy", you want to "ignore" the quorum in a >> two-node cluster to allow failover ... and if your drbd device is >> already in sync, remove that drbd-fence-by-handler-nfs-ms_drbd_nfs >> constraint. >> >> Regards, >> Andreas >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >>> >>> >>> >>> Corosync.log from master after I stopped pacemaker on master node : see >>> attached file >>> >>> >>> >>> Additional files (attached): crm-configure show >>> >>> Corosync.conf >>> >>> >> Global_common.conf >>> >>> >>> >>> >>> >>> I’m not sure how to proceed to get it up in a fair state now >>> >>> So if anyone could help me it would be much appreciated >>> >>> >>> >>> Kind regards >>> >>> /Fredrik >>> >>> >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
