Is bugzilla available today? When I try to access the site, I've gotten page not found and also a message that it is being merged with another?
Doug On Mon, 2007-05-07 at 10:13 +0200, Andrew Beekhof wrote: > can you open a bug for this and include the _complete_ logs as well as > which version you're running (as I no longer recall) > > On 5/4/07, Doug Knight <[EMAIL PROTECTED]> wrote: > > It seems the two nodes in my cluster are behaving differently from each > > other. First, some simplification/mapping for node names to compare to > > the attached logs: > > > > node1 - arc-tkincaidlx > > node2 - arc-dknightlx > > > > And references to the resource group include Filesystem, pgsql, and > > IPaddr colocated and ordered resources > > > > Heartbeat shutdowns and restarts on node1, regardless of whether it is > > DC, has active resources, etc, all perform as expected. If the resources > > are on node1, they migrate successfully to node2. If the location > > constraint sets the resources to node1, and node1 re-enters the cluster, > > all resources migrate back. Its when ANY heartbeat stop, start, restart, > > occurs on node2 that things break. For instance: > > > > node1 is DC, master rsc_drbd_7788:1, group active > > node2 is slave rsc_drbd_7788:0 ONLY > > /etc/init.d/heartbeat stop is executed on node2 > > node1 tries to execute a demote on rsc_drbd_7788:1 > > demote fails because group is active on node1, Filesystem is holding the > > drbd device open via mount point > > heartbeat continues to loop trying to demote on node1, about 9 times a > > second > > heartbeat on node2, where stop was executed, loops calling > > notify/pre/demote on rsc_drbd_7788:0, about once a second > > > > It takes a manual kill of heartbeat to get things back in order, and in > > the mean time drbd goes split brain, or so it seems by what I have to do > > to manually get drbd connected again. So, the problem is that heartbeat > > thinks it needs to demote the master rsc_drbd_7788:1 resource, and even > > if this was correct, it doesn't handle the group resources that are > > dependent on it and ordered/colocated with it. The attached logs cover > > the entire sequence of events during the shutdown of heartbeat on node2. > > Times of significance to help in looking at the logs are: > > > > Node2 HB shutdown started at 14:03:31 > > Manually started killing HB on node2 at 14:05:33 > > Node2 completed HB shutdown at 14:06:03 > > Node2 Timer pop at 14:06:33 > > Node1 HB shutdown to try to alleviate looping at 14:07:51 > > > > The logs are kind of large due to the looping (I deleted most of the > > looping, so if more info is needed I can provide the complete logs), and > > I've zipped them up, so if this email exceeds the list's size limits I > > respectfully ask the moderator to allow it to go through. > > > > Doug Knight > > WSI, Inc. > > > > > > > > > > digging into that now. If I shutdown the node that does not have the > > > > > > active resources, the following happens: > > > > > > > > > > > > (State: DC on active node1, running drbd master and group resources) > > > > > > shutdown node2 > > > > > > demote attempted on node1 for drbd master, > > > > > > > > > > Why demote? It's master running on a good node. > > > > > > > > > > > > > Don't know, this is what I observed. I wondered why it would do a demote > > > > when this node is already OK. > > > > > > > > > > no attempt at halting groups > > > > > > resources that depend on drbd > > > > > > > > > > Why should the resources be stopped? You shutdown a node which > > > > > doesn't have any resources. > > > > > > > > > > > > > truncated... > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > > > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
