On Fri, Apr 27, 2007 at 03:10:22PM -0400, Doug Knight wrote: > I now have a working configuration with DRBD master/slave, and a > filesystem/pgsql/ipaddr group following it around. So far, I've been > using a Place constraint and modifying its uname value to test the "fail > over" of the resources. Can someone suggest a reasonable set of tests > that most do to verify other possible error conditions (short of pulling > the plug on one of the servers)?
You can run CTS with your configuration. Otherwise, stopping heartbeat in a way that it doesn't notice being stopped (kill -9) simulates the "pull power plug" condition. You'd also want to make various resources fail. > Also, the Place constraint is on the > DRBD master/slave, does that make sense or should it be placed on one of > the "higher level" resources like the file system or pgsql? I don't think it matters, you can go with either, given that the resources are collocated. > Thanks, > Doug > > On Thu, 2007-04-26 at 09:45 -0400, Doug Knight wrote: > > > Hi Alastair, > > Have you encountered a situation where when you first start up the drbd > > master/slave resource, crm_mon and/or the GUI indicate Master status on > > one node, and Started status on the other (as opposed to Slave)? If so, > > how did you correct it? > > > > Doug > > p.s. Thanks for the scripts and xml, they're a big help! > > > > On Mon, 2007-04-23 at 16:41 -0700, Alastair N. Young wrote: > > > > > Attached is the cib I am using. By adjusting the scores on the > > > drbd_m_like_ rules I can migrate the drbd master between nodes, and the > > > filesystem cleanly dismounts first and remounts on the new master after. > > > > > > What I also need it to do is to migrate the services in response to a > > > failure or other score change of the grp_www group. I've tried many > > > permutations and I can't figure this out. The best I come up with is > > > failure of the rsc_www_fs resource in situ after I manually dismount it > > > a few times. At worst, Bad Things Happen. > > > > > > As best as I can guess grp_www won't move to the slave node no matter > > > what. Perhaps because of the -INFINITY in the colocation? > > > > > > What I need is to have the other node become master and then have > > > grp_www start on it. Essentially I need the master state of drbd-ms to > > > effectively be the first member of grp_www. I know that cannot be done > > > overtly, but how does one get that effect? > > > > > > What's the incantation to get the master_slave to change master in > > > response to failure/scorechange on a collocated service? > > > > > > I am running hb2.0.8 on CentOS4.4 i386 running under vmware. > > > Drbd is v0.7 with the modified/fixed drbd ocf script I posted earlier. > > > > > > Alastair Young > > > Director, Operations > > > Ludi labs > > > 399 West El Camino Real > > > Mountain View, CA 94040 > > > Email: [EMAIL PROTECTED] > > > Direct: 650-241-0068 > > > Mobile: 925-784-0812 > > > > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of Alastair N. > > > Young > > > Sent: Monday, April 23, 2007 2:19 PM > > > To: General Linux-HA mailing list > > > Subject: RE: [Linux-HA] Cannot create group containing drbd using HB GUI > > > > > > I'm also wrangling with this issue (getting drbd OCF to work in V2, > > > logically grouping master mode with the services that are on it) > > > > > > One thing I've run into so far is that there appear to be some bugs in > > > the drbd ocf script. > > > > > > 1) In do_cmd() it uses "local cmd_out" immediately before taking the > > > result code from $?. This always succeeds (on CentOS 4.4 32 bit anyway). > > > Declaring this local in an earlier line returns the correct return code > > > from the drbdadm command from the function. As this return code is used > > > elsewhere, it helps that failure codes are passed back as intended. > > > > > > 2) There needs to be a wait loop after the module is loaded, same as is > > > in the drbd distributed /etc/init.d/drbd script. I inserted this into > > > drbd_start() (UDEV_TIMEOUT is set in the script header to 10) > > > > > > # make sure udev has time to create the device files > > > for RESOURCE in `$DRBDADM sh-resources`; do > > > for DEVICE in `$DRBDADM sh-dev $RESOURCE`; do > > > UDEV_TIMEOUT_LOCAL=$UDEV_TIMEOUT > > > while [ ! -e $DEVICE ] && [ $UDEV_TIMEOUT_LOCAL -gt > > > 0 ] ; do > > > sleep 1 > > > UDEV_TIMEOUT_LOCAL=$(( $UDEV_TIMEOUT_LOCAL-1 )) > > > done > > > done > > > done > > > > > > It takes several seconds after the modload returns for the /dev/drbd0 > > > device to appear - and nothing works until it does. > > > > > > 3) A similar timer is needed in drbd_promote as drbdadm won't let you > > > "Primary" until the other is not "Primary". I found that hearbeat was > > > firing off the promote on "b" slightly before the "demote" on "a", > > > causing a failure. > > > > > > I added this: (REMOTE_DEMOTE_TIMEOUT is set in the script header to 10) > > > > > > drbd_get_status > > > DEMOTE_TIMEOUT_LOCAL=$REMOTE_DEMOTE_TIMEOUT > > > while [ "x$DRBD_STATE_REMOTE" = "xPrimary" ] && [ $DEMOTE_TIMEOUT_LOCAL > > > -gt 0 ] ; do > > > sleep 1 > > > DEMOTE_TIMEOUT_LOCAL=$(( $DEMOTE_TIMEOUT_LOCAL-1 )) > > > drbd_get_status > > > done > > > > > > With these changes I was able to get drbd to start, stop and migrate > > > cleanly when I tweaked the location scores. > > > > > > Getting the services dependent on that disk to do the same is still an > > > open question :-) > > > > > > My modified drbd ocf script is attached, use at your own risk. > > > > > > > > > Alastair Young > > > Director, Operations > > > Ludi labs > > > 399 West El Camino Real > > > Mountain View, CA 94040 > > > Email: [EMAIL PROTECTED] > > > Direct: 650-241-0068 > > > Mobile: 925-784-0812 > > > -----Original Message----- > > > From: [EMAIL PROTECTED] > > > [mailto:[EMAIL PROTECTED] On Behalf Of Martin Fick > > > Sent: Thursday, April 19, 2007 1:13 PM > > > To: General Linux-HA mailing list > > > Subject: Re: [Linux-HA] Cannot create group containing drbd using HB GUI > > > > > > Hi Doug, > > > > > > I personally could not get the DRBD OCF to work, I am > > > using drbd .7x, what about you? I never tried a > > > master/slave setup though. I created my own drbd OCF, > > > it is on my site along with the CIB scripts. > > > > > > http://www.theficks.name/bin/lib/ocf/drbd > > > > > > You can even use the drbd CIBS as a starting place if > > > you want: > > > > > > http://www.theficks.name/bin/lib/heartbeat/drbd > > > > > > > > > I just updated them all (CIBS and OCF agents) if you > > > want to try them out. > > > > > > > > > -Martin > > > > > > > > > > > > --- Doug Knight <[EMAIL PROTECTED]> wrote: > > > > > > > I made the ID change indicated below (for the > > > > colocation constraints), > > > > and everything configured fine using cibadmin. Now, > > > > I started JUST the > > > > drbd master/slave resource, with the rsc_location > > > > rule setting the > > > > expression uname to one of the two nodes in the > > > > cluster. Both drbd > > > > processes come up and sync up the partition, but > > > > both are still in > > > > slave/secondary mode (i.e. the rsc_location rule did > > > > not cause a > > > > promotion). Am I missing something here? This is the > > > > rsc_location > > > > constraint: > > > > > > > > <rsc_location id="locate_drbd" rsc="rsc_drbd_7788"> > > > > <rule id="rule_drbd_on_dk" role="master" > > > > score="100"> > > > > <expression id="exp_drbd_on_dk" > > > > attribute="#uname" > > > > operation="eq" value="arc-dknightlx"/> > > > > </rule> > > > > </rsc_location> > > > > > > > > (By the way, the example from > > > > Idioms/MasterConstraints web page does not > > > > have an ID specified in the expression tag, so I > > > > added one to mine.) > > > > Doug > > > > On Thu, 2007-04-19 at 13:04 -0400, Doug Knight > > > > wrote: > > > > > > > > > ... > > > > > > > > > > > > > >> > > > > > > > > >>>> For exemple > > > > > > > > >>>> <rsc_location id="drbd1_loc_nodeA" > > > > rsc="drbd1"> > > > > > > > > >>>> <rule id="pref_drbd1_loc_nodeA" > > > > score="600"> > > > > > > > > >>>> <expression attribute="#uname" > > > > operation="eq" value="nodeA" > > > > > > > > >>>> id="pref_drbd1_loc_nodeA_attr"/> > > > > > > > > >>>> </rule> > > > > > > > > >>>> <rule id="pref_drbd1_loc_nodeB" > > > > score="800"> > > > > > > > > >>>> <expression attribute="#uname" > > > > operation="eq" value="nodeB" > > > > > > > > >>>> id="pref_drbd1_loc_nodeB_attr"/> > > > > > > > > >>>> </rule> > > > > > > > > >>>> </rsc_location> > > > > > > > > >>>> > > > > > > > > >>>> In this case, nodeB will be primary for > > > > resource drbd1. Is that what > > > > > > > > >>>> > > > > > > > > >> you > > > > > > > > >> > > > > > > > > >>>> were looking for ? > > > > > > > > >>>> > > > > > > > > >>> Not like this, not when using the drbd > > > > OCF Resource Agent as a > > > > > > > > >>> master-slave one. In that case, you need > > > > to bind the rsc_location to > > > > > > > > >>> > > > > > > > > >> the > > > > > > > > >> > > > > > > > > >>> role=Master as well. > > > > > > > > >>> > > > > > > > > >> I was missing this in the CIB idioms > > > > page. I just added it. > > > > > > > > >> > > > > > > > > >> http://linux-ha.org/CIB/Idioms > > > > > > > > > > > > > > > I tried setting up colocation constraints similar > > > > to those shown in the > > > > > example referenced in the URL above, and it > > > > complained about the > > > > > identical ids: > > > > > > > > > > [EMAIL PROTECTED] xml]# more > > > > rule_fs_on_drbd_slave.xml > > > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788" > > > > to_role="slave" > > > > > from="fs_mirror" score="-infinity"/> > > > > > > > > > > [EMAIL PROTECTED] xml]# more > > > > rule_fs_on_drbd_stopped.xml > > > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788" > > > > to_role="stopped" > > > > > from="fs_mirror" score="-infinity"/> > > > > > > > > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints > > > > -C -x > > > > > rule_fs_on_drbd_stopped.xml > > > > > > > > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints > > > > -C -x > > > > > rule_fs_on_drbd_slave.xml > > > > > Call cib_create failed (-21): The object already > > > > exists > > > > > <failed> > > > > > <failed_update id="fs_on_drbd" > > > > object_type="rsc_colocation" > > > > > operation="add" reason="The object already > > > > exists"> > > > > > <rsc_colocation id="fs_on_drbd" > > > > to="rsc_drbd_7788" to_role="slave" > > > > > from="fs_mirror" score="-infinity"/> > > > > > </failed_update> > > > > > </failed> > > > > > > > > > > I'm going to change the ids to be unique and try > > > > again, but wanted to > > > > > point this out since it is very similar to the > > > > example on the web page. > > > > > > > > > > Doug > > > > > > > > > > > > > > > > > > > > > > > >> > > > > http://linux-ha.org/CIB/Idioms/MasterConstraints > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Linux-HA mailing list > > > > > > > > > [email protected] > > > > > > > > > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > > > > > See also: > > > > http://linux-ha.org/ReportingProblems > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Linux-HA mailing list > > > > > > > > [email protected] > > > > > > > > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > > > > See also: > > > > http://linux-ha.org/ReportingProblems > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Linux-HA mailing list > > > > > > > [email protected] > > > > > > > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > > > See also: > > > > http://linux-ha.org/ReportingProblems > > > > > > _______________________________________________ > > > > > > Linux-HA mailing list > > > > > > [email protected] > > > > > > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > > _______________________________________________ > > > > > Linux-HA mailing list > > > > > [email protected] > > > > > > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > [email protected] > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > > > > > > > > > > > --- Doug Knight <[EMAIL PROTECTED]> wrote: > > > > > > > I made the ID change indicated below (for the > > > > colocation constraints), > > > > and everything configured fine using cibadmin. Now, > > > > I started JUST the > > > > drbd master/slave resource, with the rsc_location > > > > rule setting the > > > > expression uname to one of the two nodes in the > > > > cluster. Both drbd > > > > processes come up and sync up the partition, but > > > > both are still in > > > > slave/secondary mode (i.e. the rsc_location rule did > > > > not cause a > > > > promotion). Am I missing something here? This is the > > > > rsc_location > > > > constraint: > > > > > > > > <rsc_location id="locate_drbd" rsc="rsc_drbd_7788"> > > > > <rule id="rule_drbd_on_dk" role="master" > > > > score="100"> > > > > <expression id="exp_drbd_on_dk" > > > > attribute="#uname" > > > > operation="eq" value="arc-dknightlx"/> > > > > </rule> > > > > </rsc_location> > > > > > > > > (By the way, the example from > > > > Idioms/MasterConstraints web page does not > > > > have an ID specified in the expression tag, so I > > > > added one to mine.) > > > > Doug > > > > On Thu, 2007-04-19 at 13:04 -0400, Doug Knight > > > > wrote: > > > > > > > > > ... > > > > > > > > > > > > > >> > > > > > > > > >>>> For exemple > > > > > > > > >>>> <rsc_location id="drbd1_loc_nodeA" > > > > rsc="drbd1"> > > > > > > > > >>>> <rule id="pref_drbd1_loc_nodeA" > > > > score="600"> > > > > > > > > >>>> <expression attribute="#uname" > > > > operation="eq" value="nodeA" > > > > > > > > >>>> id="pref_drbd1_loc_nodeA_attr"/> > > > > > > > > >>>> </rule> > > > > > > > > >>>> <rule id="pref_drbd1_loc_nodeB" > > > > score="800"> > > > > > > > > >>>> <expression attribute="#uname" > > > > operation="eq" value="nodeB" > > > > > > > > >>>> id="pref_drbd1_loc_nodeB_attr"/> > > > > > > > > >>>> </rule> > > > > > > > > >>>> </rsc_location> > > > > > > > > >>>> > > > > > > > > >>>> In this case, nodeB will be primary for > > > > resource drbd1. Is that what > > > > > > > > >>>> > > > > > > > > >> you > > > > > > > > >> > > > > > > > > >>>> were looking for ? > > > > > > > > >>>> > > > > > > > > >>> Not like this, not when using the drbd > > > > OCF Resource Agent as a > > > > > > > > >>> master-slave one. In that case, you need > > > > to bind the rsc_location to > > > > > > > > >>> > > > > > > > > >> the > > > > > > > > >> > > > > > > > > >>> role=Master as well. > > > > > > > > >>> > > > > > > > > >> I was missing this in the CIB idioms > > > > page. I just added it. > > > > > > > > >> > > > > > > > > >> http://linux-ha.org/CIB/Idioms > > > > > > > > > > > > > > > I tried setting up colocation constraints similar > > > > to those shown in the > > > > > example referenced in the URL above, and it > > > > complained about the > > > > > identical ids: > > > > > > > > > > [EMAIL PROTECTED] xml]# more > > > > rule_fs_on_drbd_slave.xml > > > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788" > > > > to_role="slave" > > > > > from="fs_mirror" score="-infinity"/> > > > > > > > > > > [EMAIL PROTECTED] xml]# more > > > > rule_fs_on_drbd_stopped.xml > > > > > <rsc_colocation id="fs_on_drbd" to="rsc_drbd_7788" > > > > to_role="stopped" > > > > > from="fs_mirror" score="-infinity"/> > > > > > > > > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints > > > > -C -x > > > > > rule_fs_on_drbd_stopped.xml > > > > > > > > > > [EMAIL PROTECTED] xml]# cibadmin -o constraints > > > > -C -x > > > > > rule_fs_on_drbd_slave.xml > > > > > Call cib_create failed (-21): The object already > > > > exists > > > > > <failed> > > > > > <failed_update id="fs_on_drbd" > > > > object_type="rsc_colocation" > > > > > operation="add" reason="The object already > > > > exists"> > > > > > <rsc_colocation id="fs_on_drbd" > > > > to="rsc_drbd_7788" to_role="slave" > > > > > from="fs_mirror" score="-infinity"/> > > > > > </failed_update> > > > > > </failed> > > > > > > > > > > I'm going to change the ids to be unique and try > > > > again, but wanted to > > > > > point this out since it is very similar to the > > > > example on the web page. > > > > > > > > > > Doug > > > > > > > > > > > > > > > > > > > > > > > >> > > > > http://linux-ha.org/CIB/Idioms/MasterConstraints > > > > > > > > > __________________________________________________ > > > Do You Yahoo!? > > > Tired of spam? Yahoo! Mail has the best spam protection around > > > http://mail.yahoo.com > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems -- Dejan _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
