Hmm, kill -9 on the active node is not sufficient to simulate a node
going down. Heartbeat goes away, but the file system remains mounted and
drbd remains primary on what was the active node. 
On Thu, 2007-05-03 at 09:08 -0400, Doug Knight wrote:

> Thanks Dejan, I'll try the kill -9. One thing I'm seeing is that I can
> easily move the resources between nodes using the <location> constraint,
> but if I shutdown heartbeat on one node (/etc/init.d/heartbeat stop) I
> run into problems. If I shutdown the node with the active resources,
> heartbeat migrates the DRBD Master to the other node but the colocated
> group does not migrate (it remains stopped on the active node). I'm
> digging into that now. If I shutdown the node that does not have the
> active resources, the following happens:
> 
> (State: DC on active node1, running drbd master and group resources)
> shutdown node2
> demote attempted on node1 for drbd master, no attempt at halting groups
> resources that depend on drbd
> demote of drbd master fails due to "device held open" error, filesystem
> still has it mounted
> loops through continuously trying to demote drbd (spin condition)
> shutdown command never completes, control-C, then kill -9 main heartbeat
> on node1
> drbd:0 goes stopped, :1 Master goes FAILED, group resources all still
> show started
> startup command executed on node1, Bad Things Happen, eventually drbd
> goes unmanaged
> after node1 heartbeat startup completes, stop group and drbd, restart
> resources, everything comes up fine
> 
> I'm going to try a similar test, but using kill -9 right off the bat
> instead of the controlled shutdown. If there's any info I need to
> provide to make this clearer, please, anybody, just let me know.
> 
> Doug
> 
> On Thu, 2007-05-03 at 13:14 +0200, Dejan Muhamedagic wrote:
> 
> > On Fri, Apr 27, 2007 at 03:10:22PM -0400, Doug Knight wrote:
> > > I now have a working configuration with DRBD master/slave, and a
> > > filesystem/pgsql/ipaddr group following it around. So far, I've been
> > > using a Place constraint and modifying its uname value to test the "fail
> > > over" of the resources. Can someone suggest a reasonable set of tests
> > > that most do to verify other possible error conditions (short of pulling
> > > the plug on one of the servers)?
> > 
> > You can run CTS with your configuration. Otherwise, stopping
> > heartbeat in a way that it doesn't notice being stopped (kill -9)
> > simulates the "pull power plug" condition. You'd also want to
> > make various resources fail.
> > 
> > > Also, the Place constraint is on the
> > > DRBD master/slave, does that make sense or should it be placed on one of
> > > the "higher level" resources like the file system or pgsql?
> > 
> > I don't think it matters, you can go with either, given that the
> > resources are collocated.
> > 
> > > Thanks,
> > > Doug
> > > 
> > > On Thu, 2007-04-26 at 09:45 -0400, Doug Knight wrote:
> > > 
> > > > Hi Alastair,
> > > > Have you encountered a situation where when you first start up the drbd
> > > > master/slave resource, crm_mon and/or the GUI indicate Master status on
> > > > one node, and Started status on the other (as opposed to Slave)? If so,
> > > > how did you correct it?
> > > > 
> > > > Doug
> > > > p.s. Thanks for the scripts and xml, they're a big help!
> > > > 
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to