On 2013-03-20 13:30, Fredrik Hudner wrote: > I presume you are correct about that. (see drbdadm-dump.txt) > > fence-peer /usr/lib/drbd/crm-fence-peer.sh; > after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; ... to remove the constraint, once secondary is in sync again after a resync run. Regards, Andreas > > What would I need to do to overwrite it ? > Or if you have a nicer way to do it.. It's not easy to take over someones > else configuration always > > Kind regards > /Fredrik > > On Tue, Mar 19, 2013 at 11:32 PM, Andreas Kurz <[email protected]> wrote: > >> On 2013-03-19 16:02, Fredrik Hudner wrote: >>> Just wanted to change what document it*s been built from.. It should be >>> "LINBIT DRBD 8.4 Configuration Guide: NFS on RHEL 6 >> >> There is again that fencing-constraint in your configuration .... what >> does "drbdadm dump all" look like? Any chance you only specified a >> fence-peer handler in you resource configuration but don't overwrite >> that after-resync-target handler you specified in your >> global_common.conf ... that would explain that dangling constraint that >> will prevent a failover. >> >> Regards, >> Andreas >> >> -- >> Need help with Pacemaker? >> http://www.hastexo.com/now >> >>> >>> ---------- Forwarded message ---------- >>> From: Fredrik Hudner <[email protected]> >>> Date: Mon, Mar 18, 2013 at 11:06 AM >>> Subject: Re: [Linux-HA] Problem promoting slave to master >>> To: General Linux-HA mailing list <[email protected]> >>> >>> >>> >>> >>> On Fri, Mar 15, 2013 at 1:04 AM, Andreas Kurz <[email protected]> >> wrote: >>> >>>> On 2013-03-14 15:52, Fredrik Hudner wrote: >>>>> I set no-quorum-policy to ignore and removed the constraint you >>>> mentioned. >>>>> It then managed to failover once to the slave node, but I still have >>>> those. >>>>> >>>>> Failed actions: >>>>> >>>>> p_exportfs_root:0_monitor_ >>>>>> >>>>>> 30000 (node=testclu01, call=12, rc=7, >>>>>> status=complete): not running >>>>>> >>>>>> p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7, >>>>>> status=complete): not running >>>> >>>> This only tells you that monitoring of these resources found them once >>>> not running .... logs should tell you what & when that happens >>>> >>> >>> I have attached the logs from master and slave.. I can see that it stops, >>> but not really why (to limited knowledge to read the logs) >>> >>>> >>>>> >>>>> I then stoped the new maste-node to see if it fell over to the other >> node >>>>> with no success.. It remains slave. >>>> >>>> Hard to say without seeing current cluster state like a "crm_mon -1frA", >>>> "cat /proc/drbd" and some logs ... not enough information ... >>>> >>>> I have attached the output from crm_mon, the new crm configure and >>> /proc/drbd >>> >>> >>>>> I also noticed that the constraint >> drbd-fence-by-handler-nfs-ms_drbd_nfs >>>>> was back in the crm configure. Seems like cib makes a replace >>>> >>>> This constraint is added by the DRBD primary if it looses connection to >>>> its peer and is perfectly fine if you stopped one node. >>>> >>>> Seems like the cluster have a problem attaching to the cluster node ip, >>> but I'm not sure why >>> >>> i would like to add, that I took over this configuration from a guy that >>> has left, but I know that it's configured by using the technical >>> documentation from LINBIT "Highly available NFS storage with DRBD and >>> Pacemaker". >>> >>>> >>>>> Mar 14 15:06:18 [1786] tdtestclu02 crmd: info: >>>>> abort_transition_graph: te_update_diff:126 - Triggered >> transition >>>>> abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.781.1) : >>>> Non-status >>>>> change >>>>> Mar 14 15:06:18 [1786] tdtestclu02 crmd: notice: >>>>> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ >>>>> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] >>>>> Mar 14 15:06:18 [1781] tdtestclu02 cib: info: >>>>> cib_replace_notify: Replaced: 0.780.39 -> 0.781.1 from tdtestclu01 >>>>> >>>>> So not sure how to remove that constraint on a permanent basis.. it's >>>> gone >>>>> as long as I don't stop pacemaker. >>>> >>>> Once the DRBD resync is finished it will be removed from the cluster >>>> configuration again automatically... you typically never need to remove >>>> such drbd-fence-constraints manually only in some rare failure >> scenarios. >>>> >>>> Regards, >>>> Andreas >>>> >>>> >>>>> >>>>> But it used to work booth with the no-quorom-policy=freeze and that >>>>> constraint >>>>> >>>>> Kind regards >>>>> /Fredrik >>>>> >>>>> >>>>> >>>>> On Thu, Mar 14, 2013 at 2:49 PM, Andreas Kurz <[email protected]> >>>> wrote: >>>>> >>>>>> On 2013-03-14 13:30, Fredrik Hudner wrote: >>>>>>> Hi all, >>>>>>> >>>>>>> I have a problem after I removed a node with the force command from >> my >>>>>> crm >>>>>>> config. >>>>>>> >>>>>>> Originally I had 2 nodes running HA cluster (corosync 1.4.1-7.el6, >>>>>>> pacemaker 1.1.7-6.el6) >>>>>>> >>>>>>> >>>>>>> >>>>>>> Then I wanted to add a third node acting as quorum node, but was not >>>> able >>>>>>> to get it to work (probably because I don’t understand how to set it >>>> up). >>>>>>> >>>>>>> So I removed the 3rd node, but had to use the force command as crm >>>>>>> complained when I tried to remove it. >>>>>>> >>>>>>> >>>>>>> >>>>>>> Now when I start up Pacemaker the resources doesn’t look like they >> come >>>>>> up >>>>>>> correctly >>>>>>> >>>>>>> >>>>>>> >>>>>>> Online: [ testclu01 testclu02 ] >>>>>>> >>>>>>> >>>>>>> >>>>>>> Master/Slave Set: ms_drbd_nfs [p_drbd_nfs] >>>>>>> >>>>>>> Masters: [ testclu01 ] >>>>>>> >>>>>>> Slaves: [ testclu02 ] >>>>>>> >>>>>>> Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver] >>>>>>> >>>>>>> Started: [ tdtestclu01 tdtestclu02 ] >>>>>>> >>>>>>> Resource Group: g_nfs >>>>>>> >>>>>>> p_lvm_nfs (ocf::heartbeat:LVM): Started testclu01 >>>>>>> >>>>>>> p_fs_shared (ocf::heartbeat:Filesystem): Started >>>> testclu01 >>>>>>> >>>>>>> p_fs_shared2 (ocf::heartbeat:Filesystem): Started >>>> testclu01 >>>>>>> >>>>>>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started testclu01 >>>>>>> >>>>>>> Clone Set: cl_exportfs_root [p_exportfs_root] >>>>>>> >>>>>>> Started: [ testclu01 testclu02 ] >>>>>>> >>>>>>> >>>>>>> >>>>>>> Failed actions: >>>>>>> >>>>>>> p_exportfs_root:0_monitor_30000 (node=testclu01, call=12, rc=7, >>>>>>> status=complete): not running >>>>>>> >>>>>>> p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7, >>>>>>> status=complete): not running >>>>>>> >>>>>>> >>>>>>> >>>>>>> The filesystems mount correctly on the master at this stage and can >> be >>>>>>> written to. >>>>>>> >>>>>>> When I stop the services on the master node for it to failover, it >>>>>> doesn’t >>>>>>> work.. Looses cluster-ip connectivity >>>>>> >>>>>> fix your "no-quorum-policy", you want to "ignore" the quorum in a >>>>>> two-node cluster to allow failover ... and if your drbd device is >>>>>> already in sync, remove that drbd-fence-by-handler-nfs-ms_drbd_nfs >>>>>> constraint. >>>>>> >>>>>> Regards, >>>>>> Andreas >>>>>> >>>>>> -- >>>>>> Need help with Pacemaker? >>>>>> http://www.hastexo.com/now >>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> Corosync.log from master after I stopped pacemaker on master node : >>>> see >>>>>>> attached file >>>>>>> >>>>>>> >>>>>>> >>>>>>> Additional files (attached): crm-configure show >>>>>>> >>>>>>> >> Corosync.conf >>>>>>> >>>>>>> >>>>>> Global_common.conf >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> I’m not sure how to proceed to get it up in a fair state now >>>>>>> >>>>>>> So if anyone could help me it would be much appreciated >>>>>>> >>>>>>> >>>>>>> >>>>>>> Kind regards >>>>>>> >>>>>>> /Fredrik >>>>>>> >>>>>>> >>>>>>> >>>>>>> _______________________________________________ >>>>>>> Linux-HA mailing list >>>>>>> [email protected] >>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>>> >>>>>> >>>>>> >>>>>> _______________________________________________ >>>>>> Linux-HA mailing list >>>>>> [email protected] >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>>> See also: http://linux-ha.org/ReportingProblems >>>>>> >>>>> _______________________________________________ >>>>> Linux-HA mailing list >>>>> [email protected] >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>>> See also: http://linux-ha.org/ReportingProblems >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Linux-HA mailing list >>>> [email protected] >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>>> See also: http://linux-ha.org/ReportingProblems >>>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> Linux-HA mailing list >>> [email protected] >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha >>> See also: http://linux-ha.org/ReportingProblems >>> >> >> >> >> _______________________________________________ >> Linux-HA mailing list >> [email protected] >> http://lists.linux-ha.org/mailman/listinfo/linux-ha >> See also: http://linux-ha.org/ReportingProblems >> > > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
