Is bugzilla available today? When I try to access the site, I've gotten
page not found and also a message that it is being merged with another? 

Doug

On Mon, 2007-05-07 at 10:13 +0200, Andrew Beekhof wrote:

> can you open a bug for this and include the _complete_ logs as well as
> which version you're running (as I no longer recall)
> 
> On 5/4/07, Doug Knight <[EMAIL PROTECTED]> wrote:
> > It seems the two nodes in my cluster are behaving differently from each
> > other. First, some simplification/mapping for node names to compare to
> > the attached logs:
> >
> > node1 - arc-tkincaidlx
> > node2 - arc-dknightlx
> >
> > And references to the resource group include Filesystem, pgsql, and
> > IPaddr colocated and ordered resources
> >
> > Heartbeat shutdowns and restarts on node1, regardless of whether it is
> > DC, has active resources, etc, all perform as expected. If the resources
> > are on node1, they migrate successfully to node2. If the location
> > constraint sets the resources to node1, and node1 re-enters the cluster,
> > all resources migrate back. Its when ANY heartbeat stop, start, restart,
> > occurs on node2 that things break. For instance:
> >
> > node1 is DC, master rsc_drbd_7788:1, group active
> > node2 is slave rsc_drbd_7788:0 ONLY
> > /etc/init.d/heartbeat stop is executed on node2
> > node1 tries to execute a demote on rsc_drbd_7788:1
> > demote fails because group is active on node1, Filesystem is holding the
> > drbd device open via mount point
> > heartbeat continues to loop trying to demote on node1, about 9 times a
> > second
> > heartbeat on node2, where stop was executed, loops calling
> > notify/pre/demote on rsc_drbd_7788:0, about once a second
> >
> > It takes a manual kill of heartbeat to get things back in order, and in
> > the mean time drbd goes split brain, or so it seems by what I have to do
> > to manually get drbd connected again. So, the problem is that heartbeat
> > thinks it needs to demote the master rsc_drbd_7788:1 resource, and even
> > if this was correct, it doesn't handle the group resources that are
> > dependent on it and ordered/colocated with it. The attached logs cover
> > the entire sequence of events during the shutdown of heartbeat on node2.
> > Times of significance to help in looking at the logs are:
> >
> > Node2 HB shutdown started at 14:03:31
> > Manually started killing HB on node2 at 14:05:33
> > Node2 completed HB shutdown at 14:06:03
> > Node2 Timer pop at 14:06:33
> > Node1 HB shutdown to try to alleviate looping at 14:07:51
> >
> >  The logs are kind of large due to the looping (I deleted most of the
> > looping, so if more info is needed I can provide the complete logs), and
> > I've zipped them up, so if this email exceeds the list's size limits I
> > respectfully ask the moderator to allow it to go through.
> >
> > Doug Knight
> > WSI, Inc.
> >
> >
> > > > > > digging into that now. If I shutdown the node that does not have the
> > > > > > active resources, the following happens:
> > > > > >
> > > > > > (State: DC on active node1, running drbd master and group resources)
> > > > > > shutdown node2
> > > > > > demote attempted on node1 for drbd master,
> > > > >
> > > > > Why demote? It's master running on a good node.
> > > > >
> > > >
> > > > Don't know, this is what I observed. I wondered why it would do a demote
> > > > when this node is already OK.
> > > >
> > > > > > no attempt at halting groups
> > > > > > resources that depend on drbd
> > > > >
> > > > > Why should the resources be stopped? You shutdown a node which
> > > > > doesn't have any resources.
> > > > >
> > > >
> >
> > truncated...
> >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
> >
> >
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to