Much appreciated Andreas, Thanks for your help On Wed, Mar 20, 2013 at 3:58 PM, Andreas Kurz <[email protected]> wrote:
> On 2013-03-20 13:30, Fredrik Hudner wrote: > > I presume you are correct about that. (see drbdadm-dump.txt) > > > > fence-peer /usr/lib/drbd/crm-fence-peer.sh; > > after-resync-target /usr/lib/drbd/unsnapshot-resync-target-lvm.sh; > > after-resync-target /usr/lib/drbd/crm-unfence-peer.sh; > > ... to remove the constraint, once secondary is in sync again after a > resync run. > > Regards, > Andreas > > > > > What would I need to do to overwrite it ? > > Or if you have a nicer way to do it.. It's not easy to take over someones > > else configuration always > > > > Kind regards > > /Fredrik > > > > On Tue, Mar 19, 2013 at 11:32 PM, Andreas Kurz <[email protected]> > wrote: > > > >> On 2013-03-19 16:02, Fredrik Hudner wrote: > >>> Just wanted to change what document it*s been built from.. It should be > >>> "LINBIT DRBD 8.4 Configuration Guide: NFS on RHEL 6 > >> > >> There is again that fencing-constraint in your configuration .... what > >> does "drbdadm dump all" look like? Any chance you only specified a > >> fence-peer handler in you resource configuration but don't overwrite > >> that after-resync-target handler you specified in your > >> global_common.conf ... that would explain that dangling constraint that > >> will prevent a failover. > >> > >> Regards, > >> Andreas > >> > >> -- > >> Need help with Pacemaker? > >> http://www.hastexo.com/now > >> > >>> > >>> ---------- Forwarded message ---------- > >>> From: Fredrik Hudner <[email protected]> > >>> Date: Mon, Mar 18, 2013 at 11:06 AM > >>> Subject: Re: [Linux-HA] Problem promoting slave to master > >>> To: General Linux-HA mailing list <[email protected]> > >>> > >>> > >>> > >>> > >>> On Fri, Mar 15, 2013 at 1:04 AM, Andreas Kurz <[email protected]> > >> wrote: > >>> > >>>> On 2013-03-14 15:52, Fredrik Hudner wrote: > >>>>> I set no-quorum-policy to ignore and removed the constraint you > >>>> mentioned. > >>>>> It then managed to failover once to the slave node, but I still have > >>>> those. > >>>>> > >>>>> Failed actions: > >>>>> > >>>>> p_exportfs_root:0_monitor_ > >>>>>> > >>>>>> 30000 (node=testclu01, call=12, rc=7, > >>>>>> status=complete): not running > >>>>>> > >>>>>> p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7, > >>>>>> status=complete): not running > >>>> > >>>> This only tells you that monitoring of these resources found them once > >>>> not running .... logs should tell you what & when that happens > >>>> > >>> > >>> I have attached the logs from master and slave.. I can see that it > stops, > >>> but not really why (to limited knowledge to read the logs) > >>> > >>>> > >>>>> > >>>>> I then stoped the new maste-node to see if it fell over to the other > >> node > >>>>> with no success.. It remains slave. > >>>> > >>>> Hard to say without seeing current cluster state like a "crm_mon > -1frA", > >>>> "cat /proc/drbd" and some logs ... not enough information ... > >>>> > >>>> I have attached the output from crm_mon, the new crm configure and > >>> /proc/drbd > >>> > >>> > >>>>> I also noticed that the constraint > >> drbd-fence-by-handler-nfs-ms_drbd_nfs > >>>>> was back in the crm configure. Seems like cib makes a replace > >>>> > >>>> This constraint is added by the DRBD primary if it looses connection > to > >>>> its peer and is perfectly fine if you stopped one node. > >>>> > >>>> Seems like the cluster have a problem attaching to the cluster node > ip, > >>> but I'm not sure why > >>> > >>> i would like to add, that I took over this configuration from a guy > that > >>> has left, but I know that it's configured by using the technical > >>> documentation from LINBIT "Highly available NFS storage with DRBD and > >>> Pacemaker". > >>> > >>>> > >>>>> Mar 14 15:06:18 [1786] tdtestclu02 crmd: info: > >>>>> abort_transition_graph: te_update_diff:126 - Triggered > >> transition > >>>>> abort (complete=1, tag=diff, id=(null), magic=NA, cib=0.781.1) : > >>>> Non-status > >>>>> change > >>>>> Mar 14 15:06:18 [1786] tdtestclu02 crmd: notice: > >>>>> do_state_transition: State transition S_IDLE -> S_POLICY_ENGINE [ > >>>>> input=I_PE_CALC cause=C_FSA_INTERNAL origin=abort_transition_graph ] > >>>>> Mar 14 15:06:18 [1781] tdtestclu02 cib: info: > >>>>> cib_replace_notify: Replaced: 0.780.39 -> 0.781.1 from tdtestclu01 > >>>>> > >>>>> So not sure how to remove that constraint on a permanent basis.. it's > >>>> gone > >>>>> as long as I don't stop pacemaker. > >>>> > >>>> Once the DRBD resync is finished it will be removed from the cluster > >>>> configuration again automatically... you typically never need to > remove > >>>> such drbd-fence-constraints manually only in some rare failure > >> scenarios. > >>>> > >>>> Regards, > >>>> Andreas > >>>> > >>>> > >>>>> > >>>>> But it used to work booth with the no-quorom-policy=freeze and that > >>>>> constraint > >>>>> > >>>>> Kind regards > >>>>> /Fredrik > >>>>> > >>>>> > >>>>> > >>>>> On Thu, Mar 14, 2013 at 2:49 PM, Andreas Kurz <[email protected]> > >>>> wrote: > >>>>> > >>>>>> On 2013-03-14 13:30, Fredrik Hudner wrote: > >>>>>>> Hi all, > >>>>>>> > >>>>>>> I have a problem after I removed a node with the force command from > >> my > >>>>>> crm > >>>>>>> config. > >>>>>>> > >>>>>>> Originally I had 2 nodes running HA cluster (corosync 1.4.1-7.el6, > >>>>>>> pacemaker 1.1.7-6.el6) > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Then I wanted to add a third node acting as quorum node, but was > not > >>>> able > >>>>>>> to get it to work (probably because I don’t understand how to set > it > >>>> up). > >>>>>>> > >>>>>>> So I removed the 3rd node, but had to use the force command as crm > >>>>>>> complained when I tried to remove it. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Now when I start up Pacemaker the resources doesn’t look like they > >> come > >>>>>> up > >>>>>>> correctly > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Online: [ testclu01 testclu02 ] > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Master/Slave Set: ms_drbd_nfs [p_drbd_nfs] > >>>>>>> > >>>>>>> Masters: [ testclu01 ] > >>>>>>> > >>>>>>> Slaves: [ testclu02 ] > >>>>>>> > >>>>>>> Clone Set: cl_lsb_nfsserver [p_lsb_nfsserver] > >>>>>>> > >>>>>>> Started: [ tdtestclu01 tdtestclu02 ] > >>>>>>> > >>>>>>> Resource Group: g_nfs > >>>>>>> > >>>>>>> p_lvm_nfs (ocf::heartbeat:LVM): Started testclu01 > >>>>>>> > >>>>>>> p_fs_shared (ocf::heartbeat:Filesystem): Started > >>>> testclu01 > >>>>>>> > >>>>>>> p_fs_shared2 (ocf::heartbeat:Filesystem): Started > >>>> testclu01 > >>>>>>> > >>>>>>> p_ip_nfs (ocf::heartbeat:IPaddr2): Started testclu01 > >>>>>>> > >>>>>>> Clone Set: cl_exportfs_root [p_exportfs_root] > >>>>>>> > >>>>>>> Started: [ testclu01 testclu02 ] > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Failed actions: > >>>>>>> > >>>>>>> p_exportfs_root:0_monitor_30000 (node=testclu01, call=12, rc=7, > >>>>>>> status=complete): not running > >>>>>>> > >>>>>>> p_exportfs_root:1_monitor_30000 (node=testclu02, call=12, rc=7, > >>>>>>> status=complete): not running > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> The filesystems mount correctly on the master at this stage and can > >> be > >>>>>>> written to. > >>>>>>> > >>>>>>> When I stop the services on the master node for it to failover, it > >>>>>> doesn’t > >>>>>>> work.. Looses cluster-ip connectivity > >>>>>> > >>>>>> fix your "no-quorum-policy", you want to "ignore" the quorum in a > >>>>>> two-node cluster to allow failover ... and if your drbd device is > >>>>>> already in sync, remove that drbd-fence-by-handler-nfs-ms_drbd_nfs > >>>>>> constraint. > >>>>>> > >>>>>> Regards, > >>>>>> Andreas > >>>>>> > >>>>>> -- > >>>>>> Need help with Pacemaker? > >>>>>> http://www.hastexo.com/now > >>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Corosync.log from master after I stopped pacemaker on master node : > >>>> see > >>>>>>> attached file > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Additional files (attached): crm-configure show > >>>>>>> > >>>>>>> > >> Corosync.conf > >>>>>>> > >>>>>>> > >>>>>> Global_common.conf > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> I’m not sure how to proceed to get it up in a fair state now > >>>>>>> > >>>>>>> So if anyone could help me it would be much appreciated > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> Kind regards > >>>>>>> > >>>>>>> /Fredrik > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Linux-HA mailing list > >>>>>>> [email protected] > >>>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>>>>>> See also: http://linux-ha.org/ReportingProblems > >>>>>>> > >>>>>> > >>>>>> > >>>>>> _______________________________________________ > >>>>>> Linux-HA mailing list > >>>>>> [email protected] > >>>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>>>>> See also: http://linux-ha.org/ReportingProblems > >>>>>> > >>>>> _______________________________________________ > >>>>> Linux-HA mailing list > >>>>> [email protected] > >>>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>>>> See also: http://linux-ha.org/ReportingProblems > >>>>> > >>>> > >>>> > >>>> _______________________________________________ > >>>> Linux-HA mailing list > >>>> [email protected] > >>>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>>> See also: http://linux-ha.org/ReportingProblems > >>>> > >>> > >>> > >>> > >>> > >>> > >>> _______________________________________________ > >>> Linux-HA mailing list > >>> [email protected] > >>> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >>> See also: http://linux-ha.org/ReportingProblems > >>> > >> > >> > >> > >> _______________________________________________ > >> Linux-HA mailing list > >> [email protected] > >> http://lists.linux-ha.org/mailman/listinfo/linux-ha > >> See also: http://linux-ha.org/ReportingProblems > >> > > > > > > > > > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > -- Fredrik Hudner Grosse Pfahlstr 12 30161 Hannover Germany Tel: 0511-642 09 548 Mob: 0173-254 39 29 _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
