I now have a working configuration with DRBD master/slave, and a
filesystem/pgsql/ipaddr group following it around. So far, I've been
using a Place constraint and modifying its uname value to test the "fail
over" of the resources. Can someone suggest a reasonable set of tests
that most do to verify other possible error conditions (short of pulling
the plug on one of the servers)? Also, the Place constraint is on the
DRBD master/slave, does that make sense or should it be placed on one of
the "higher level" resources like the file system or pgsql?

Thanks,
Doug

On Thu, 2007-04-26 at 09:45 -0400, Doug Knight wrote:

> Hi Alastair,
> Have you encountered a situation where when you first start up the drbd
> master/slave resource, crm_mon and/or the GUI indicate Master status on
> one node, and Started status on the other (as opposed to Slave)? If so,
> how did you correct it?
> 
> Doug
> p.s. Thanks for the scripts and xml, they're a big help!
> 
> On Mon, 2007-04-23 at 16:41 -0700, Alastair N. Young wrote:
> 
> > Attached is the cib I am using. By adjusting the scores on the
> > drbd_m_like_ rules I can migrate the drbd master between nodes, and the
> > filesystem cleanly dismounts first and remounts on the new master after.
> > 
> > What I also need it to do is to migrate the services in response to a
> > failure or other score change of the grp_www group. I've tried many
> > permutations and I can't figure this out. The best I come up with is
> > failure of the rsc_www_fs resource in situ after I manually dismount it
> > a few times. At worst, Bad Things Happen. 
> > 
> > As best as I can guess grp_www won't move to the slave node no matter
> > what. Perhaps because of the -INFINITY in the colocation? 
> > 
> > What I need is to have the other node become master and then have
> > grp_www start on it. Essentially I need the master state of drbd-ms to
> > effectively be the first member of grp_www. I know that cannot be done
> > overtly, but how does one get that effect?
> > 
> > What's the incantation to get the master_slave to change master in
> > response to failure/scorechange on a collocated service?
> > 
> > I am running hb2.0.8 on CentOS4.4 i386 running under vmware.
> > Drbd is v0.7 with the modified/fixed drbd ocf script I posted earlier.
> > 
> > Alastair Young
> > Director, Operations
> > Ludi labs
> > 399 West El Camino Real
> > Mountain View, CA 94040
> > Email: [EMAIL PROTECTED]
> > Direct: 650-241-0068
> > Mobile: 925-784-0812
> > 
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Alastair N.
> > Young
> > Sent: Monday, April 23, 2007 2:19 PM
> > To: General Linux-HA mailing list
> > Subject: RE: [Linux-HA] Cannot create group containing drbd using HB GUI
> > 
> > I'm also wrangling with this issue (getting drbd OCF to work in V2,
> > logically grouping master mode with the services that are on it)
> > 
> > One thing I've run into so far is that there appear to be some bugs in
> > the drbd ocf script.
> > 
> > 1) In do_cmd() it uses "local cmd_out" immediately before taking the
> > result code from $?. This always succeeds (on CentOS 4.4 32 bit anyway).
> > Declaring this local in an earlier line returns the correct return code
> > from the drbdadm command from the function. As this return code is used
> > elsewhere, it helps that failure codes are passed back as intended.
> > 
> > 2) There needs to be a wait loop after the module is loaded, same as is
> > in the drbd distributed /etc/init.d/drbd script. I inserted this into
> > drbd_start() (UDEV_TIMEOUT is set in the script header to 10)
> > 
> >             # make sure udev has time to create the device files
> >             for RESOURCE in `$DRBDADM sh-resources`; do
> >                 for DEVICE in `$DRBDADM sh-dev $RESOURCE`; do
> >                     UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT
> >                     while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt
> > 0 ] ; do
> >                         sleep 1
> >                         UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 ))
> >                     done
> >                 done
> >             done
> > 
> > It takes several seconds after the modload returns for the /dev/drbd0
> > device to appear - and nothing works until it does.
> > 
> > 3) A similar timer is needed in drbd_promote as drbdadm won't let you
> > "Primary" until the other is not "Primary". I found that hearbeat was
> > firing off the promote on "b" slightly before the "demote" on "a",
> > causing a failure.
> > 
> > I added this: (REMOTE_DEMOTE_TIMEOUT is set in the script header to 10)
> > 
> >  drbd_get_status
> >  DEMOTE_TIMEOUT_LOCAL=$REMOTE_DEMOTE_TIMEOUT
> >  while [ "x$DRBD_STATE_REMOTE" = "xPrimary" ] && [ $DEMOTE_TIMEOUT_LOCAL
> > -gt 0 ] ; do
> >     sleep 1
> >     DEMOTE_TIMEOUT_LOCAL=$(( $DEMOTE_TIMEOUT_LOCAL-1 ))
> >     drbd_get_status
> >  done
> > 
> > With these changes I was able to get drbd to start, stop and migrate
> > cleanly when I tweaked the location scores.
> > 
> > Getting the services dependent on that disk to do the same is still an
> > open question :-)
> > 
> > My modified drbd ocf script is attached, use at your own risk.
> > 
> > 
> > Alastair Young
> > Director, Operations
> > Ludi labs
> > 399 West El Camino Real
> > Mountain View, CA 94040
> > Email: [EMAIL PROTECTED]
> > Direct: 650-241-0068
> > Mobile: 925-784-0812
> > -----Original Message-----
> > From: [EMAIL PROTECTED]
> > [mailto:[EMAIL PROTECTED] On Behalf Of Martin Fick
> > Sent: Thursday, April 19, 2007 1:13 PM
> > To: General Linux-HA mailing list
> > Subject: Re: [Linux-HA] Cannot create group containing drbd using HB GUI
> > 
> > Hi Doug,
> > 
> > I personally could not get the DRBD OCF to work, I am
> > using drbd .7x, what about you?  I never tried a
> > master/slave setup though.  I created my own drbd OCF,
> > it is on my site along with the CIB scripts.
> > 
> > http://www.theficks.name/bin/lib/ocf/drbd
> > 
> > You can even use the drbd CIBS as a starting place if
> > you want:
> > 
> > http://www.theficks.name/bin/lib/heartbeat/drbd
> > 
> > 
> > I just updated them all (CIBS and OCF agents) if you
> > want to try them out.  
> > 
> > 
> > -Martin
> > 
> > 
> > 
> > --- Doug Knight <[EMAIL PROTECTED]> wrote:
> > 
> > > I made the ID change indicated below (for the
> > > colocation constraints),
> > > and everything configured fine using cibadmin. Now,
> > > I started JUST the
> > > drbd master/slave resource, with the rsc_location
> > > rule setting the
> > > expression uname to one of the two nodes in the
> > > cluster. Both drbd
> > > processes come up and sync up the partition, but
> > > both are still in
> > > slave/secondary mode (i.e. the rsc_location rule did
> > > not cause a
> > > promotion). Am I missing something here? This is the
> > > rsc_location
> > > constraint:
> > > 
> > > <rsc_location id="locate_drbd" rsc="rsc_drbd_7788">
> > >         <rule id="rule_drbd_on_dk" role="master"
> > > score="100">
> > >                 <expression id="exp_drbd_on_dk"
> > > attribute="#uname"
> > > operation="eq" value="arc-dknightlx"/>
> > >         </rule>
> > > </rsc_location>
> > > 
> > > (By the way, the example from
> > > Idioms/MasterConstraints web page does not
> > > have an ID specified in the expression tag, so I
> > > added one to mine.)
> > > Doug
> > > On Thu, 2007-04-19 at 13:04 -0400, Doug Knight
> > > wrote:
> > > 
> > > > ...
> > > > 
> > > > > > > >>     
> > > > > > > >>>> For exemple
> > > > > > > >>>> <rsc_location id="drbd1_loc_nodeA"
> > > rsc="drbd1">
> > > > > > > >>>>     <rule id="pref_drbd1_loc_nodeA"
> > > score="600">
> > > > > > > >>>>          <expression attribute="#uname"
> > > operation="eq" value="nodeA" 
> > > > > > > >>>> id="pref_drbd1_loc_nodeA_attr"/>
> > > > > > > >>>>     </rule>
> > > > > > > >>>>     <rule id="pref_drbd1_loc_nodeB"
> > > score="800">
> > > > > > > >>>>          <expression attribute="#uname"
> > > operation="eq" value="nodeB" 
> > > > > > > >>>> id="pref_drbd1_loc_nodeB_attr"/>
> > > > > > > >>>>     </rule>
> > > > > > > >>>> </rsc_location>
> > > > > > > >>>>
> > > > > > > >>>> In this case, nodeB will be primary for
> > > resource drbd1. Is that what
> > > > > > > >>>>         
> > > > > > > >> you 
> > > > > > > >>     
> > > > > > > >>>> were looking for ?
> > > > > > > >>>>         
> > > > > > > >>> Not like this, not when using the drbd
> > > OCF Resource Agent as a
> > > > > > > >>> master-slave one. In that case, you need
> > > to bind the rsc_location to
> > > > > > > >>>       
> > > > > > > >> the
> > > > > > > >>     
> > > > > > > >>> role=Master as well.
> > > > > > > >>>       
> > > > > > > >> I was missing this in the CIB idioms
> > > page.  I just added it.
> > > > > > > >>
> > > > > > > >>        http://linux-ha.org/CIB/Idioms
> > > > 
> > > > 
> > > > I tried setting up colocation constraints similar
> > > to those shown in the
> > > > example referenced in the URL above, and it
> > > complained about the
> > > > identical ids:
> > > > 
> > > > [EMAIL PROTECTED] xml]# more
> > > rule_fs_on_drbd_slave.xml 
> > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788"
> > > to_role="slave"
> > > > from="fs_mirror" score="-infinity"/>
> > > > 
> > > > [EMAIL PROTECTED] xml]# more
> > > rule_fs_on_drbd_stopped.xml 
> > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788"
> > > to_role="stopped"
> > > > from="fs_mirror" score="-infinity"/>
> > > > 
> > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints
> > > -C -x
> > > > rule_fs_on_drbd_stopped.xml 
> > > > 
> > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints
> > > -C -x
> > > > rule_fs_on_drbd_slave.xml 
> > > > Call cib_create failed (-21): The object already
> > > exists
> > > >  <failed>
> > > >    <failed_update id="fs_on_drbd"
> > > object_type="rsc_colocation"
> > > > operation="add" reason="The object already
> > > exists">
> > > >      <rsc_colocation id="fs_on_drbd"
> > > to="rsc_drbd_7788" to_role="slave"
> > > > from="fs_mirror" score="-infinity"/>
> > > >    </failed_update>
> > > >  </failed>
> > > > 
> > > > I'm going to change the ids to be unique and try
> > > again, but wanted to
> > > > point this out since it is very similar to the
> > > example on the web page.
> > > > 
> > > > Doug
> > > > 
> > > > 
> > > > 
> > > > > > > >> 
> > > http://linux-ha.org/CIB/Idioms/MasterConstraints
> > > > > > > >>
> > > > > > > >>
> > > > > > > >>     
> > > > > > > >
> > > _______________________________________________
> > > > > > > > Linux-HA mailing list
> > > > > > > > [email protected]
> > > > > > > >
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > > > See also:
> > > http://linux-ha.org/ReportingProblems
> > > > > > > >
> > > > > > > >   
> > > > > > >
> > > _______________________________________________
> > > > > > > Linux-HA mailing list
> > > > > > > [email protected]
> > > > > > >
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > > See also:
> > > http://linux-ha.org/ReportingProblems
> > > > > > > 
> > > > > >
> > > _______________________________________________
> > > > > > Linux-HA mailing list
> > > > > > [email protected]
> > > > > >
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > See also:
> > > http://linux-ha.org/ReportingProblems
> > > > > _______________________________________________
> > > > > Linux-HA mailing list
> > > > > [email protected]
> > > > >
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > See also: http://linux-ha.org/ReportingProblems
> > > > > 
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > [email protected]
> > > >
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > > 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > > 
> > 
> > 
> > 
> > --- Doug Knight <[EMAIL PROTECTED]> wrote:
> > 
> > > I made the ID change indicated below (for the
> > > colocation constraints),
> > > and everything configured fine using cibadmin. Now,
> > > I started JUST the
> > > drbd master/slave resource, with the rsc_location
> > > rule setting the
> > > expression uname to one of the two nodes in the
> > > cluster. Both drbd
> > > processes come up and sync up the partition, but
> > > both are still in
> > > slave/secondary mode (i.e. the rsc_location rule did
> > > not cause a
> > > promotion). Am I missing something here? This is the
> > > rsc_location
> > > constraint:
> > > 
> > > <rsc_location id="locate_drbd" rsc="rsc_drbd_7788">
> > >         <rule id="rule_drbd_on_dk" role="master"
> > > score="100">
> > >                 <expression id="exp_drbd_on_dk"
> > > attribute="#uname"
> > > operation="eq" value="arc-dknightlx"/>
> > >         </rule>
> > > </rsc_location>
> > > 
> > > (By the way, the example from
> > > Idioms/MasterConstraints web page does not
> > > have an ID specified in the expression tag, so I
> > > added one to mine.)
> > > Doug
> > > On Thu, 2007-04-19 at 13:04 -0400, Doug Knight
> > > wrote:
> > > 
> > > > ...
> > > > 
> > > > > > > >>     
> > > > > > > >>>> For exemple
> > > > > > > >>>> <rsc_location id="drbd1_loc_nodeA"
> > > rsc="drbd1">
> > > > > > > >>>>     <rule id="pref_drbd1_loc_nodeA"
> > > score="600">
> > > > > > > >>>>          <expression attribute="#uname"
> > > operation="eq" value="nodeA" 
> > > > > > > >>>> id="pref_drbd1_loc_nodeA_attr"/>
> > > > > > > >>>>     </rule>
> > > > > > > >>>>     <rule id="pref_drbd1_loc_nodeB"
> > > score="800">
> > > > > > > >>>>          <expression attribute="#uname"
> > > operation="eq" value="nodeB" 
> > > > > > > >>>> id="pref_drbd1_loc_nodeB_attr"/>
> > > > > > > >>>>     </rule>
> > > > > > > >>>> </rsc_location>
> > > > > > > >>>>
> > > > > > > >>>> In this case, nodeB will be primary for
> > > resource drbd1. Is that what
> > > > > > > >>>>         
> > > > > > > >> you 
> > > > > > > >>     
> > > > > > > >>>> were looking for ?
> > > > > > > >>>>         
> > > > > > > >>> Not like this, not when using the drbd
> > > OCF Resource Agent as a
> > > > > > > >>> master-slave one. In that case, you need
> > > to bind the rsc_location to
> > > > > > > >>>       
> > > > > > > >> the
> > > > > > > >>     
> > > > > > > >>> role=Master as well.
> > > > > > > >>>       
> > > > > > > >> I was missing this in the CIB idioms
> > > page.  I just added it.
> > > > > > > >>
> > > > > > > >>        http://linux-ha.org/CIB/Idioms
> > > > 
> > > > 
> > > > I tried setting up colocation constraints similar
> > > to those shown in the
> > > > example referenced in the URL above, and it
> > > complained about the
> > > > identical ids:
> > > > 
> > > > [EMAIL PROTECTED] xml]# more
> > > rule_fs_on_drbd_slave.xml 
> > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788"
> > > to_role="slave"
> > > > from="fs_mirror" score="-infinity"/>
> > > > 
> > > > [EMAIL PROTECTED] xml]# more
> > > rule_fs_on_drbd_stopped.xml 
> > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788"
> > > to_role="stopped"
> > > > from="fs_mirror" score="-infinity"/>
> > > > 
> > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints
> > > -C -x
> > > > rule_fs_on_drbd_stopped.xml 
> > > > 
> > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints
> > > -C -x
> > > > rule_fs_on_drbd_slave.xml 
> > > > Call cib_create failed (-21): The object already
> > > exists
> > > >  <failed>
> > > >    <failed_update id="fs_on_drbd"
> > > object_type="rsc_colocation"
> > > > operation="add" reason="The object already
> > > exists">
> > > >      <rsc_colocation id="fs_on_drbd"
> > > to="rsc_drbd_7788" to_role="slave"
> > > > from="fs_mirror" score="-infinity"/>
> > > >    </failed_update>
> > > >  </failed>
> > > > 
> > > > I'm going to change the ids to be unique and try
> > > again, but wanted to
> > > > point this out since it is very similar to the
> > > example on the web page.
> > > > 
> > > > Doug
> > > > 
> > > > 
> > > > 
> > > > > > > >> 
> > > http://linux-ha.org/CIB/Idioms/MasterConstraints
> > 
> > 
> > __________________________________________________
> > Do You Yahoo!?
> > Tired of spam?  Yahoo! Mail has the best spam protection around 
> > http://mail.yahoo.com 
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to