Re: [Linux-HA] Fwd: Problem promoting slave to master

Fredrik Hudner Fri, 22 Mar 2013 02:44:06 -0700

Much appreciated Andreas,
Thanks for your help

On Wed, Mar 20, 2013 at 3:58 PM, Andreas Kurz <[email protected]> wrote:


> On 2013-03-20 13:30, Fredrik Hudner wrote:
> > I presume you are correct about that. (see drbdadm-dump.txt)
> >
> > fence-peer       /usr/lib/drbd/crm-fence-peer.sh;
> > after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh;
>
> after-resync-target /usr/lib/drbd/crm-unfence-peer.sh;
>
> ... to remove the constraint, once secondary is in sync again after a
> resync run.
>
> Regards,
> Andreas
>
> >
> > What would I need to do to overwrite it ?
> > Or if you have a nicer way to do it.. It's not easy to take over someones
> > else configuration always
> >
> > Kind regards
> > /Fredrik
> >
> > On Tue, Mar 19, 2013 at 11:32 PM, Andreas Kurz <[email protected]>
> wrote:
> >
> >> On 2013-03-19 16:02, Fredrik Hudner wrote:
> >>> Just wanted to change what document it*s been built from.. It should be
> >>> "LINBIT DRBD 8.4 Configuration Guide: NFS on RHEL 6
> >>
> >> There is again that fencing-constraint in your configuration .... what
> >> does "drbdadm dump all" look like? Any chance you only specified a
> >> fence-peer handler in you resource configuration but don't overwrite
> >> that after-resync-target handler you specified in your
> >> global_common.conf ... that would explain that dangling constraint that
> >> will prevent a failover.
> >>
> >> Regards,
> >> Andreas
> >>
> >> --
> >> Need help with Pacemaker?
> >> http://www.hastexo.com/now
> >>
> >>>
> >>> ---------- Forwarded message ----------
> >>> From: Fredrik Hudner <[email protected]>
> >>> Date: Mon, Mar 18, 2013 at 11:06 AM
> >>> Subject: Re: [Linux-HA] Problem promoting slave to master
> >>> To: General Linux-HA mailing list <[email protected]>
> >>>
> >>>
> >>>
> >>>
> >>> On Fri, Mar 15, 2013 at 1:04 AM, Andreas Kurz <[email protected]>
> >> wrote:
> >>>
> >>>> On 2013-03-14 15:52, Fredrik Hudner wrote:
> >>>>> I set no-quorum-policy to ignore and removed the constraint you
> >>>> mentioned.
> >>>>> It then managed to failover once to the slave node, but I still have
> >>>> those.
> >>>>>
> >>>>> Failed actions:
> >>>>>
> >>>>>      p_exportfs_root:0_monitor_
> >>>>>>
> >>>>>> 30000 (node=testclu01, call=12, rc=7,
> >>>>>>   status=complete): not running
> >>>>>>
> >>>>>>      p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7,
> >>>>>>   status=complete): not running
> >>>>
> >>>> This only tells you that monitoring of these resources found them once
> >>>> not running .... logs should tell you what & when that happens
> >>>>
> >>>
> >>> I have attached the logs from master and slave.. I can see that it
> stops,
> >>> but not really why (to limited knowledge to read the logs)
> >>>
> >>>>
> >>>>>
> >>>>> I then stoped the new maste-node to see if it fell over to the other
> >> node
> >>>>> with no success.. It remains slave.
> >>>>
> >>>> Hard to say without seeing current cluster state like a "crm_mon
> -1frA",
> >>>> "cat /proc/drbd" and some logs ... not enough information ...
> >>>>
> >>>> I have attached the output from crm_mon, the new crm configure and
> >>> /proc/drbd
> >>>
> >>>
> >>>>> I also noticed that the constraint
> >> drbd-fence-by-handler-nfs-ms_drbd_nfs
> >>>>> was back in the crm configure. Seems like cib makes a replace
> >>>>
> >>>> This constraint is added by the DRBD primary if it looses connection
> to
> >>>> its peer and is perfectly fine if you stopped one node.
> >>>>
> >>>> Seems like the cluster have a problem attaching to the cluster node
> ip,
> >>> but I'm not sure why
> >>>
> >>> i would like to add, that I took over this configuration from a guy
> that
> >>> has left, but I know that it's configured by using the technical
> >>> documentation from LINBIT "Highly available NFS storage with DRBD and
> >>> Pacemaker".
> >>>
> >>>>
> >>>>> Mar 14 15:06:18 [1786] tdtestclu02       crmd:     info:
> >>>>> abort_transition_graph:        te_update_diff:126 - Triggered
> >> transition
> >>>>> abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.781.1) :
> >>>> Non-status
> >>>>> change
> >>>>> Mar 14 15:06:18 [1786] tdtestclu02       crmd:   notice:
> >>>>> do_state_transition:   State transition S_IDLE -> S_POLICY_ENGINE [
> >>>>> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ]
> >>>>> Mar 14 15:06:18 [1781] tdtestclu02        cib:     info:
> >>>>> cib_replace_notify:    Replaced: 0.780.39 -> 0.781.1 from tdtestclu01
> >>>>>
> >>>>> So not sure how to remove that constraint on a permanent basis.. it's
> >>>> gone
> >>>>> as long as I don't stop pacemaker.
> >>>>
> >>>> Once the DRBD resync is finished it will be removed from the cluster
> >>>> configuration again automatically... you typically never need to
> remove
> >>>> such drbd-fence-constraints manually only in some rare failure
> >> scenarios.
> >>>>
> >>>> Regards,
> >>>> Andreas
> >>>>
> >>>>
> >>>>>
> >>>>> But it used to work booth with the no-quorom-policy=freeze and that
> >>>>> constraint
> >>>>>
> >>>>> Kind regards
> >>>>> /Fredrik
> >>>>>
> >>>>>
> >>>>>
> >>>>> On Thu, Mar 14, 2013 at 2:49 PM, Andreas Kurz <[email protected]>
> >>>> wrote:
> >>>>>
> >>>>>> On 2013-03-14 13:30, Fredrik Hudner wrote:
> >>>>>>> Hi all,
> >>>>>>>
> >>>>>>> I have a problem after I removed a node with the force command from
> >> my
> >>>>>> crm
> >>>>>>> config.
> >>>>>>>
> >>>>>>> Originally I had 2 nodes running HA cluster (corosync 1.4.1-7.el6,
> >>>>>>> pacemaker 1.1.7-6.el6)
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Then I wanted to add a third node acting as quorum node, but was
> not
> >>>> able
> >>>>>>> to get it to work (probably because I don’t understand how to set
> it
> >>>> up).
> >>>>>>>
> >>>>>>> So I removed the 3rd node, but had to use the force command as crm
> >>>>>>> complained when I tried to remove it.
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Now when I start up Pacemaker the resources doesn’t look like they
> >> come
> >>>>>> up
> >>>>>>> correctly
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Online: [ testclu01 testclu02 ]
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Master/Slave Set: ms_drbd_nfs [p_drbd_nfs]
> >>>>>>>
> >>>>>>>      Masters: [ testclu01 ]
> >>>>>>>
> >>>>>>>      Slaves: [ testclu02 ]
> >>>>>>>
> >>>>>>> Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver]
> >>>>>>>
> >>>>>>>      Started: [ tdtestclu01 tdtestclu02 ]
> >>>>>>>
> >>>>>>> Resource Group: g_nfs
> >>>>>>>
> >>>>>>>      p_lvm_nfs  (ocf::heartbeat:LVM):   Started testclu01
> >>>>>>>
> >>>>>>>      p_fs_shared        (ocf::heartbeat:Filesystem):    Started
> >>>> testclu01
> >>>>>>>
> >>>>>>>      p_fs_shared2       (ocf::heartbeat:Filesystem):    Started
> >>>> testclu01
> >>>>>>>
> >>>>>>>      p_ip_nfs   (ocf::heartbeat:IPaddr2):       Started testclu01
> >>>>>>>
> >>>>>>> Clone Set: cl_exportfs_root [p_exportfs_root]
> >>>>>>>
> >>>>>>>      Started: [ testclu01 testclu02 ]
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Failed actions:
> >>>>>>>
> >>>>>>>     p_exportfs_root:0_monitor_30000 (node=testclu01, call=12, rc=7,
> >>>>>>> status=complete): not running
> >>>>>>>
> >>>>>>>     p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7,
> >>>>>>> status=complete): not running
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> The filesystems mount correctly on the master at this stage and can
> >> be
> >>>>>>> written to.
> >>>>>>>
> >>>>>>> When I stop the services on the master node for it to failover, it
> >>>>>> doesn’t
> >>>>>>> work.. Looses cluster-ip connectivity
> >>>>>>
> >>>>>> fix your "no-quorum-policy", you want to "ignore" the quorum in a
> >>>>>> two-node cluster to allow failover ... and if your drbd device is
> >>>>>> already in sync, remove that drbd-fence-by-handler-nfs-ms_drbd_nfs
> >>>>>> constraint.
> >>>>>>
> >>>>>> Regards,
> >>>>>> Andreas
> >>>>>>
> >>>>>> --
> >>>>>> Need help with Pacemaker?
> >>>>>> http://www.hastexo.com/now
> >>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Corosync.log from master after I stopped pacemaker on master node :
> >>>>  see
> >>>>>>> attached file
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Additional files (attached): crm-configure show
> >>>>>>>
> >>>>>>>
> >> Corosync.conf
> >>>>>>>
> >>>>>>>
> >>>>>> Global_common.conf
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> I’m not sure how to proceed to get it up in a fair state now
> >>>>>>>
> >>>>>>> So if anyone could help me it would be much appreciated
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> Kind regards
> >>>>>>>
> >>>>>>> /Fredrik
> >>>>>>>
> >>>>>>>
> >>>>>>>
> >>>>>>> _______________________________________________
> >>>>>>> Linux-HA mailing list
> >>>>>>> [email protected]
> >>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>>> See also: http://linux-ha.org/ReportingProblems
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> Linux-HA mailing list
> >>>>>> [email protected]
> >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>>> See also: http://linux-ha.org/ReportingProblems
> >>>>>>
> >>>>> _______________________________________________
> >>>>> Linux-HA mailing list
> >>>>> [email protected]
> >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>>> See also: http://linux-ha.org/ReportingProblems
> >>>>>
> >>>>
> >>>>
> >>>> _______________________________________________
> >>>> Linux-HA mailing list
> >>>> [email protected]
> >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>>> See also: http://linux-ha.org/ReportingProblems
> >>>>
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Linux-HA mailing list
> >>> [email protected]
> >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >>> See also: http://linux-ha.org/ReportingProblems
> >>>
> >>
> >>
> >>
> >> _______________________________________________
> >> Linux-HA mailing list
> >> [email protected]
> >> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> >> See also: http://linux-ha.org/ReportingProblems
> >>
> >
> >
> >
> >
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>



-- 
Fredrik Hudner
Grosse Pfahlstr 12
30161 Hannover
Germany

Tel: 0511-642 09 548
Mob: 0173-254 39 29
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Fwd: Problem promoting slave to master

Reply via email to