On Fri, Apr 27, 2007 at 03:10:22PM -0400, Doug Knight wrote:
> I now have a working configuration with DRBD master/slave, and a
> filesystem/pgsql/ipaddr group following it around. So far, I've been
> using a Place constraint and modifying its uname value to test the "fail
> over" of the resources. Can someone suggest a reasonable set of tests
> that most do to verify other possible error conditions (short of pulling
> the plug on one of the servers)?

You can run CTS with your configuration. Otherwise, stopping
heartbeat in a way that it doesn't notice being stopped (kill -9)
simulates the "pull power plug" condition. You'd also want to
make various resources fail.

> Also, the Place constraint is on the
> DRBD master/slave, does that make sense or should it be placed on one of
> the "higher level" resources like the file system or pgsql?

I don't think it matters, you can go with either, given that the
resources are collocated.

> Thanks,
> Doug
> 
> On Thu, 2007-04-26 at 09:45 -0400, Doug Knight wrote:
> 
> > Hi Alastair,
> > Have you encountered a situation where when you first start up the drbd
> > master/slave resource, crm_mon and/or the GUI indicate Master status on
> > one node, and Started status on the other (as opposed to Slave)? If so,
> > how did you correct it?
> > 
> > Doug
> > p.s. Thanks for the scripts and xml, they're a big help!
> > 
> > On Mon, 2007-04-23 at 16:41 -0700, Alastair N. Young wrote:
> > 
> > > Attached is the cib I am using. By adjusting the scores on the
> > > drbd_m_like_ rules I can migrate the drbd master between nodes, and the
> > > filesystem cleanly dismounts first and remounts on the new master after.
> > > 
> > > What I also need it to do is to migrate the services in response to a
> > > failure or other score change of the grp_www group. I've tried many
> > > permutations and I can't figure this out. The best I come up with is
> > > failure of the rsc_www_fs resource in situ after I manually dismount it
> > > a few times. At worst, Bad Things Happen. 
> > > 
> > > As best as I can guess grp_www won't move to the slave node no matter
> > > what. Perhaps because of the -INFINITY in the colocation? 
> > > 
> > > What I need is to have the other node become master and then have
> > > grp_www start on it. Essentially I need the master state of drbd-ms to
> > > effectively be the first member of grp_www. I know that cannot be done
> > > overtly, but how does one get that effect?
> > > 
> > > What's the incantation to get the master_slave to change master in
> > > response to failure/scorechange on a collocated service?
> > > 
> > > I am running hb2.0.8 on CentOS4.4 i386 running under vmware.
> > > Drbd is v0.7 with the modified/fixed drbd ocf script I posted earlier.
> > > 
> > > Alastair Young
> > > Director, Operations
> > > Ludi labs
> > > 399 West El Camino Real
> > > Mountain View, CA 94040
> > > Email: [EMAIL PROTECTED]
> > > Direct: 650-241-0068
> > > Mobile: 925-784-0812
> > > 
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Alastair N.
> > > Young
> > > Sent: Monday, April 23, 2007 2:19 PM
> > > To: General Linux-HA mailing list
> > > Subject: RE: [Linux-HA] Cannot create group containing drbd using HB GUI
> > > 
> > > I'm also wrangling with this issue (getting drbd OCF to work in V2,
> > > logically grouping master mode with the services that are on it)
> > > 
> > > One thing I've run into so far is that there appear to be some bugs in
> > > the drbd ocf script.
> > > 
> > > 1) In do_cmd() it uses "local cmd_out" immediately before taking the
> > > result code from $?. This always succeeds (on CentOS 4.4 32 bit anyway).
> > > Declaring this local in an earlier line returns the correct return code
> > > from the drbdadm command from the function. As this return code is used
> > > elsewhere, it helps that failure codes are passed back as intended.
> > > 
> > > 2) There needs to be a wait loop after the module is loaded, same as is
> > > in the drbd distributed /etc/init.d/drbd script. I inserted this into
> > > drbd_start() (UDEV_TIMEOUT is set in the script header to 10)
> > > 
> > >             # make sure udev has time to create the device files
> > >             for RESOURCE in `$DRBDADM sh-resources`; do
> > >                 for DEVICE in `$DRBDADM sh-dev $RESOURCE`; do
> > >                     UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT
> > >                     while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt
> > > 0 ] ; do
> > >                         sleep 1
> > >                         UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 ))
> > >                     done
> > >                 done
> > >             done
> > > 
> > > It takes several seconds after the modload returns for the /dev/drbd0
> > > device to appear - and nothing works until it does.
> > > 
> > > 3) A similar timer is needed in drbd_promote as drbdadm won't let you
> > > "Primary" until the other is not "Primary". I found that hearbeat was
> > > firing off the promote on "b" slightly before the "demote" on "a",
> > > causing a failure.
> > > 
> > > I added this: (REMOTE_DEMOTE_TIMEOUT is set in the script header to 10)
> > > 
> > >  drbd_get_status
> > >  DEMOTE_TIMEOUT_LOCAL=$REMOTE_DEMOTE_TIMEOUT
> > >  while [ "x$DRBD_STATE_REMOTE" = "xPrimary" ] && [ $DEMOTE_TIMEOUT_LOCAL
> > > -gt 0 ] ; do
> > >     sleep 1
> > >     DEMOTE_TIMEOUT_LOCAL=$(( $DEMOTE_TIMEOUT_LOCAL-1 ))
> > >     drbd_get_status
> > >  done
> > > 
> > > With these changes I was able to get drbd to start, stop and migrate
> > > cleanly when I tweaked the location scores.
> > > 
> > > Getting the services dependent on that disk to do the same is still an
> > > open question :-)
> > > 
> > > My modified drbd ocf script is attached, use at your own risk.
> > > 
> > > 
> > > Alastair Young
> > > Director, Operations
> > > Ludi labs
> > > 399 West El Camino Real
> > > Mountain View, CA 94040
> > > Email: [EMAIL PROTECTED]
> > > Direct: 650-241-0068
> > > Mobile: 925-784-0812
> > > -----Original Message-----
> > > From: [EMAIL PROTECTED]
> > > [mailto:[EMAIL PROTECTED] On Behalf Of Martin Fick
> > > Sent: Thursday, April 19, 2007 1:13 PM
> > > To: General Linux-HA mailing list
> > > Subject: Re: [Linux-HA] Cannot create group containing drbd using HB GUI
> > > 
> > > Hi Doug,
> > > 
> > > I personally could not get the DRBD OCF to work, I am
> > > using drbd .7x, what about you?  I never tried a
> > > master/slave setup though.  I created my own drbd OCF,
> > > it is on my site along with the CIB scripts.
> > > 
> > > http://www.theficks.name/bin/lib/ocf/drbd
> > > 
> > > You can even use the drbd CIBS as a starting place if
> > > you want:
> > > 
> > > http://www.theficks.name/bin/lib/heartbeat/drbd
> > > 
> > > 
> > > I just updated them all (CIBS and OCF agents) if you
> > > want to try them out.  
> > > 
> > > 
> > > -Martin
> > > 
> > > 
> > > 
> > > --- Doug Knight <[EMAIL PROTECTED]> wrote:
> > > 
> > > > I made the ID change indicated below (for the
> > > > colocation constraints),
> > > > and everything configured fine using cibadmin. Now,
> > > > I started JUST the
> > > > drbd master/slave resource, with the rsc_location
> > > > rule setting the
> > > > expression uname to one of the two nodes in the
> > > > cluster. Both drbd
> > > > processes come up and sync up the partition, but
> > > > both are still in
> > > > slave/secondary mode (i.e. the rsc_location rule did
> > > > not cause a
> > > > promotion). Am I missing something here? This is the
> > > > rsc_location
> > > > constraint:
> > > > 
> > > > <rsc_location id="locate_drbd" rsc="rsc_drbd_7788">
> > > >         <rule id="rule_drbd_on_dk" role="master"
> > > > score="100">
> > > >                 <expression id="exp_drbd_on_dk"
> > > > attribute="#uname"
> > > > operation="eq" value="arc-dknightlx"/>
> > > >         </rule>
> > > > </rsc_location>
> > > > 
> > > > (By the way, the example from
> > > > Idioms/MasterConstraints web page does not
> > > > have an ID specified in the expression tag, so I
> > > > added one to mine.)
> > > > Doug
> > > > On Thu, 2007-04-19 at 13:04 -0400, Doug Knight
> > > > wrote:
> > > > 
> > > > > ...
> > > > > 
> > > > > > > > >>     
> > > > > > > > >>>> For exemple
> > > > > > > > >>>> <rsc_location id="drbd1_loc_nodeA"
> > > > rsc="drbd1">
> > > > > > > > >>>>     <rule id="pref_drbd1_loc_nodeA"
> > > > score="600">
> > > > > > > > >>>>          <expression attribute="#uname"
> > > > operation="eq" value="nodeA" 
> > > > > > > > >>>> id="pref_drbd1_loc_nodeA_attr"/>
> > > > > > > > >>>>     </rule>
> > > > > > > > >>>>     <rule id="pref_drbd1_loc_nodeB"
> > > > score="800">
> > > > > > > > >>>>          <expression attribute="#uname"
> > > > operation="eq" value="nodeB" 
> > > > > > > > >>>> id="pref_drbd1_loc_nodeB_attr"/>
> > > > > > > > >>>>     </rule>
> > > > > > > > >>>> </rsc_location>
> > > > > > > > >>>>
> > > > > > > > >>>> In this case, nodeB will be primary for
> > > > resource drbd1. Is that what
> > > > > > > > >>>>         
> > > > > > > > >> you 
> > > > > > > > >>     
> > > > > > > > >>>> were looking for ?
> > > > > > > > >>>>         
> > > > > > > > >>> Not like this, not when using the drbd
> > > > OCF Resource Agent as a
> > > > > > > > >>> master-slave one. In that case, you need
> > > > to bind the rsc_location to
> > > > > > > > >>>       
> > > > > > > > >> the
> > > > > > > > >>     
> > > > > > > > >>> role=Master as well.
> > > > > > > > >>>       
> > > > > > > > >> I was missing this in the CIB idioms
> > > > page.  I just added it.
> > > > > > > > >>
> > > > > > > > >>      http://linux-ha.org/CIB/Idioms
> > > > > 
> > > > > 
> > > > > I tried setting up colocation constraints similar
> > > > to those shown in the
> > > > > example referenced in the URL above, and it
> > > > complained about the
> > > > > identical ids:
> > > > > 
> > > > > [EMAIL PROTECTED] xml]# more
> > > > rule_fs_on_drbd_slave.xml 
> > > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788"
> > > > to_role="slave"
> > > > > from="fs_mirror" score="-infinity"/>
> > > > > 
> > > > > [EMAIL PROTECTED] xml]# more
> > > > rule_fs_on_drbd_stopped.xml 
> > > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788"
> > > > to_role="stopped"
> > > > > from="fs_mirror" score="-infinity"/>
> > > > > 
> > > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints
> > > > -C -x
> > > > > rule_fs_on_drbd_stopped.xml 
> > > > > 
> > > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints
> > > > -C -x
> > > > > rule_fs_on_drbd_slave.xml 
> > > > > Call cib_create failed (-21): The object already
> > > > exists
> > > > >  <failed>
> > > > >    <failed_update id="fs_on_drbd"
> > > > object_type="rsc_colocation"
> > > > > operation="add" reason="The object already
> > > > exists">
> > > > >      <rsc_colocation id="fs_on_drbd"
> > > > to="rsc_drbd_7788" to_role="slave"
> > > > > from="fs_mirror" score="-infinity"/>
> > > > >    </failed_update>
> > > > >  </failed>
> > > > > 
> > > > > I'm going to change the ids to be unique and try
> > > > again, but wanted to
> > > > > point this out since it is very similar to the
> > > > example on the web page.
> > > > > 
> > > > > Doug
> > > > > 
> > > > > 
> > > > > 
> > > > > > > > >> 
> > > > http://linux-ha.org/CIB/Idioms/MasterConstraints
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >>     
> > > > > > > > >
> > > > _______________________________________________
> > > > > > > > > Linux-HA mailing list
> > > > > > > > > [email protected]
> > > > > > > > >
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > > > > See also:
> > > > http://linux-ha.org/ReportingProblems
> > > > > > > > >
> > > > > > > > >   
> > > > > > > >
> > > > _______________________________________________
> > > > > > > > Linux-HA mailing list
> > > > > > > > [email protected]
> > > > > > > >
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > > > See also:
> > > > http://linux-ha.org/ReportingProblems
> > > > > > > > 
> > > > > > >
> > > > _______________________________________________
> > > > > > > Linux-HA mailing list
> > > > > > > [email protected]
> > > > > > >
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > > See also:
> > > > http://linux-ha.org/ReportingProblems
> > > > > > _______________________________________________
> > > > > > Linux-HA mailing list
> > > > > > [email protected]
> > > > > >
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > > See also: http://linux-ha.org/ReportingProblems
> > > > > > 
> > > > > _______________________________________________
> > > > > Linux-HA mailing list
> > > > > [email protected]
> > > > >
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > > See also: http://linux-ha.org/ReportingProblems
> > > > > 
> > > > _______________________________________________
> > > > Linux-HA mailing list
> > > > [email protected]
> > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > > See also: http://linux-ha.org/ReportingProblems
> > > > 
> > > 
> > > 
> > > 
> > > --- Doug Knight <[EMAIL PROTECTED]> wrote:
> > > 
> > > > I made the ID change indicated below (for the
> > > > colocation constraints),
> > > > and everything configured fine using cibadmin. Now,
> > > > I started JUST the
> > > > drbd master/slave resource, with the rsc_location
> > > > rule setting the
> > > > expression uname to one of the two nodes in the
> > > > cluster. Both drbd
> > > > processes come up and sync up the partition, but
> > > > both are still in
> > > > slave/secondary mode (i.e. the rsc_location rule did
> > > > not cause a
> > > > promotion). Am I missing something here? This is the
> > > > rsc_location
> > > > constraint:
> > > > 
> > > > <rsc_location id="locate_drbd" rsc="rsc_drbd_7788">
> > > >         <rule id="rule_drbd_on_dk" role="master"
> > > > score="100">
> > > >                 <expression id="exp_drbd_on_dk"
> > > > attribute="#uname"
> > > > operation="eq" value="arc-dknightlx"/>
> > > >         </rule>
> > > > </rsc_location>
> > > > 
> > > > (By the way, the example from
> > > > Idioms/MasterConstraints web page does not
> > > > have an ID specified in the expression tag, so I
> > > > added one to mine.)
> > > > Doug
> > > > On Thu, 2007-04-19 at 13:04 -0400, Doug Knight
> > > > wrote:
> > > > 
> > > > > ...
> > > > > 
> > > > > > > > >>     
> > > > > > > > >>>> For exemple
> > > > > > > > >>>> <rsc_location id="drbd1_loc_nodeA"
> > > > rsc="drbd1">
> > > > > > > > >>>>     <rule id="pref_drbd1_loc_nodeA"
> > > > score="600">
> > > > > > > > >>>>          <expression attribute="#uname"
> > > > operation="eq" value="nodeA" 
> > > > > > > > >>>> id="pref_drbd1_loc_nodeA_attr"/>
> > > > > > > > >>>>     </rule>
> > > > > > > > >>>>     <rule id="pref_drbd1_loc_nodeB"
> > > > score="800">
> > > > > > > > >>>>          <expression attribute="#uname"
> > > > operation="eq" value="nodeB" 
> > > > > > > > >>>> id="pref_drbd1_loc_nodeB_attr"/>
> > > > > > > > >>>>     </rule>
> > > > > > > > >>>> </rsc_location>
> > > > > > > > >>>>
> > > > > > > > >>>> In this case, nodeB will be primary for
> > > > resource drbd1. Is that what
> > > > > > > > >>>>         
> > > > > > > > >> you 
> > > > > > > > >>     
> > > > > > > > >>>> were looking for ?
> > > > > > > > >>>>         
> > > > > > > > >>> Not like this, not when using the drbd
> > > > OCF Resource Agent as a
> > > > > > > > >>> master-slave one. In that case, you need
> > > > to bind the rsc_location to
> > > > > > > > >>>       
> > > > > > > > >> the
> > > > > > > > >>     
> > > > > > > > >>> role=Master as well.
> > > > > > > > >>>       
> > > > > > > > >> I was missing this in the CIB idioms
> > > > page.  I just added it.
> > > > > > > > >>
> > > > > > > > >>      http://linux-ha.org/CIB/Idioms
> > > > > 
> > > > > 
> > > > > I tried setting up colocation constraints similar
> > > > to those shown in the
> > > > > example referenced in the URL above, and it
> > > > complained about the
> > > > > identical ids:
> > > > > 
> > > > > [EMAIL PROTECTED] xml]# more
> > > > rule_fs_on_drbd_slave.xml 
> > > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788"
> > > > to_role="slave"
> > > > > from="fs_mirror" score="-infinity"/>
> > > > > 
> > > > > [EMAIL PROTECTED] xml]# more
> > > > rule_fs_on_drbd_stopped.xml 
> > > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788"
> > > > to_role="stopped"
> > > > > from="fs_mirror" score="-infinity"/>
> > > > > 
> > > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints
> > > > -C -x
> > > > > rule_fs_on_drbd_stopped.xml 
> > > > > 
> > > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints
> > > > -C -x
> > > > > rule_fs_on_drbd_slave.xml 
> > > > > Call cib_create failed (-21): The object already
> > > > exists
> > > > >  <failed>
> > > > >    <failed_update id="fs_on_drbd"
> > > > object_type="rsc_colocation"
> > > > > operation="add" reason="The object already
> > > > exists">
> > > > >      <rsc_colocation id="fs_on_drbd"
> > > > to="rsc_drbd_7788" to_role="slave"
> > > > > from="fs_mirror" score="-infinity"/>
> > > > >    </failed_update>
> > > > >  </failed>
> > > > > 
> > > > > I'm going to change the ids to be unique and try
> > > > again, but wanted to
> > > > > point this out since it is very similar to the
> > > > example on the web page.
> > > > > 
> > > > > Doug
> > > > > 
> > > > > 
> > > > > 
> > > > > > > > >> 
> > > > http://linux-ha.org/CIB/Idioms/MasterConstraints
> > > 
> > > 
> > > __________________________________________________
> > > Do You Yahoo!?
> > > Tired of spam?  Yahoo! Mail has the best spam protection around 
> > > http://mail.yahoo.com 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> > 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems

-- 
Dejan
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to