Juan Manuel Sende Sanchez wrote:
> Hello
>
> I have a two node cluster with drbd and heartbeat to make openvz
> machines in high aviability. All seems to work with one exception. If I
> run in the main machine (groucho) hb_standby, all the processes go to
> the second machine without problems (harpo) but when I run hb_takeover
> the only thing that comes back is the drdb leaving openvz stoped.
>
> Here are the logs after the frist hb_standby (This are for groucho)
>
> Apr 22 19:31:26 groucho heartbeat: [9171]: info: groucho.yyy.com wants
> to go standby [all]
> Apr 22 19:31:26 groucho heartbeat: [9171]: info: standby: harpo.yyy.com
> can take our all resources
> Apr 22 19:31:26 groucho heartbeat: [23071]: info: give up all HA
> resources (standby).
> Apr 22 19:31:26 groucho ResourceManager[23081]: info: Releasing resource
> group: groucho.yyy.com drbddisk::r0 Filesystem::/dev/drbd0::/vz::ext3
> vz_ha MailTo::[EMAIL PROTECTED]
> Apr 22 19:31:27 groucho ResourceManager[23081]: info: Running
> /etc/ha.d/resource.d/MailTo [EMAIL PROTECTED] stop
> Apr 22 19:31:27 groucho MailTo[23116]: INFO: MailTo Success
> Apr 22 19:31:27 groucho ResourceManager[23081]: info: Running
> /etc/init.d/vz_ha stop
> Apr 22 19:31:38 groucho kernel: VPS: 202: stopped
> Apr 22 19:31:39 groucho kernel: NET: Unregistered protocol family 17
> Apr 22 19:31:39 groucho net.agent[24642]: remove event not handled
> Apr 22 19:31:39 groucho ResourceManager[23081]: info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /vz ext3 stop
> Apr 22 19:31:40 groucho Filesystem[24726]: INFO: Running stop for
> /dev/drbd0 on /vz
> Apr 22 19:31:40 groucho Filesystem[24726]: INFO: Trying to unmount /vz
> Apr 22 19:31:40 groucho Filesystem[24726]: INFO: unmounted /vz successfully
> Apr 22 19:31:40 groucho Filesystem[24662]: INFO: Filesystem Success
> Apr 22 19:31:40 groucho ResourceManager[23081]: info: Running
> /etc/ha.d/resource.d/drbddisk r0 stop
> Apr 22 19:31:40 groucho kernel: drbd0: Primary/Secondary -->
> Secondary/Secondary
> Apr 22 19:31:41 groucho heartbeat: [23071]: info: all HA resource
> release completed (standby).
> Apr 22 19:31:41 groucho heartbeat: [9171]: info: Local standby process
> completed [all].
> Apr 22 19:31:41 groucho kernel: drbd0: Secondary/Secondary -->
> Secondary/Primary
> Apr 22 19:31:43 groucho heartbeat: [9171]: WARN: 1 lost packet(s) for
> [harpo.yyy.com] [1904346:1904348]
> Apr 22 19:31:43 groucho heartbeat: [9171]: info: remote resource
> transition completed.
> Apr 22 19:31:43 groucho heartbeat: [9171]: info: No pkts missing from
> harpo.yyy.com!
> Apr 22 19:31:43 groucho heartbeat: [9171]: info: Other node completed
> standby takeover of all resources.
> Apr 22 19:31:49 groucho net.agent[24797]: remove event not handled
>
> It seems all gone OK you can see how heartbeat stops vz_ha service and
> stops drdb disk service.
>
> Apr 22 20:31:53 harpo ResourceManager[12952]: info: Running
> /etc/ha.d/resource.d/Filesystem /dev/drbd0 /vz ext3 start
> Apr 22 20:31:53 harpo Filesystem[13191]: INFO: Running start for
> /dev/drbd0 on /vz
> Apr 22 20:31:53 harpo kernel: kjournald starting. Commit interval 5 seconds
> Apr 22 20:31:53 harpo kernel: EXT3-fs warning: maximal mount count
> reached, running e2fsck is recommended
> Apr 22 20:31:53 harpo kernel: EXT3 FS on drbd0, internal journal
> Apr 22 20:31:53 harpo kernel: EXT3-fs: mounted filesystem with ordered
> data mode.
> Apr 22 20:31:53 harpo Filesystem[13127]: INFO: Filesystem Success
> Apr 22 20:31:54 harpo MailTo[13286]: WARNING: Don't stat/monitor me!
> MailTo is a pseudo resource agent, so the status reported may be incorrect
> Apr 22 20:31:54 harpo MailTo[13240]: INFO: MailTo Resource is stopped
> Apr 22 20:31:54 harpo ResourceManager[12952]: info: Running
> /etc/ha.d/resource.d/MailTo [EMAIL PROTECTED] start
> Apr 22 20:31:54 harpo MailTo[14346]: INFO: MailTo Success
> Apr 22 20:31:54 harpo heartbeat: [12942]: info: all HA resource
> acquisition completed (standby).
> Apr 22 20:31:54 harpo heartbeat: [8861]: info: Standby resource
> acquisition done [all].
> Apr 22 20:31:54 harpo heartbeat: [8861]: info: remote resource
> transition completed.
>
> But here comes the problem the drbd service is started, the filesystem
> mounted, but the service isn't started.
>
> Here comes the haresources files for harpo and groucho
>
> groucho.yyy.com drbddisk::r0 Filesystem::/dev/drbd0::/vz::ext3 vz_ha
> MailTo::[EMAIL PROTECTED]
>
>
> I have* autofailback* off**
>
>
> Can any one point me where I´m making the mistake. By the way the mail
> is always send
>
> Hope to hear from yours
For an R1 configuration with your problem description, 90% of the time
the vz_ha resource isn't doing quite the right thing. It's probably
printing OK or Running regardless if it's running or not.
--
Alan Robertson <[EMAIL PROTECTED]>
"Openness is the foundation and preservative of friendship... Let me
claim from you at all times your undisguised opinions." - William
Wilberforce
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems