Re: [Linux-HA] Re: heartbeat shuts down all VM machines

rupert Thu, 28 Feb 2008 03:11:56 -0800

mmh, i just restart the 2nd server to check in hearbeat moves the VM
to the server1.
I couldnt find any info about that in the logfiles on the first
server, something like taking over backend-B1,
and one VM did not start. But after the reboot of the server2 after
some time it correctly starts the backend-B1


heartbeat[4959]: 2008/02/28_10:36:19 WARN: Logging daemon is disabled
--enabling logging daemon is

recommended
heartbeat[4959]: 2008/02/28_10:36:19 info: **************************
heartbeat[4959]: 2008/02/28_10:36:19 info: Configuration validated.
Starting heartbeat 2.1.2
heartbeat[4960]: 2008/02/28_10:36:19 info: heartbeat: version 2.1.2
heartbeat[4960]: 2008/02/28_10:36:19 info: Heartbeat generation: 1202824451
heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[4960]: 2008/02/28_10:36:19 info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: write socket
priority set to IPTOS_LOWDELA

Y on eth0
heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound send
socket to device: eth0
heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound receive
socket to device: eth0
heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: started on
port 694 interface eth0 to 172.

 20.2.1
heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_SignalHandler:
Added signal handler for sign

al 17
heartbeat[4960]: 2008/02/28_10:36:19 info: Local status now set to: 'up'
heartbeat[4960]: 2008/02/28_10:38:20 WARN: node xen-a1.fra1.mailcluster: is dead
heartbeat[4960]: 2008/02/28_10:38:20 info: Comm_now_up(): updating
status to active
heartbeat[4960]: 2008/02/28_10:38:20 info: Local status now set to: 'active'
heartbeat[4960]: 2008/02/28_10:38:20 WARN: No STONITH device configured.
heartbeat[4960]: 2008/02/28_10:38:20 WARN: Shared disks are not protected.
heartbeat[4960]: 2008/02/28_10:38:20 info: Resources being acquired
from xen-a1.fra1.mailcluster.
harc[4989]:     2008/02/28_10:38:20 info: Running /etc/ha.d/rc.d/status status
heartbeat[4990]: 2008/02/28_10:38:20 info: Local Resource acquisition completed.
mach_down[5019]:        2008/02/28_10:38:20 info: Taking over resource
group drbddisk::drbd_backen
                                                                    d
ResourceManager[5073]:  2008/02/28_10:38:20 info: Acquiring resource
group: xen-a1.fra1.mailcluste

r drbddisk::drbd_backend xen::backend-A1
ResourceManager[5073]:  2008/02/28_10:38:20 info: Running
/etc/ha.d/resource.d/drbddisk drbd_backe

          nd start
heartbeat[4960]: 2008/02/28_10:38:30 info: Local Resource acquisition
completed. (none)
heartbeat[4960]: 2008/02/28_10:38:30 info: local resource transition completed.
ResourceManager[5073]:  2008/02/28_10:38:32 ERROR: Return code 1 from
/etc/ha.d/resource.d/drbddis
                                                                     k
ResourceManager[5073]:  2008/02/28_10:38:32 CRIT: Giving up resources
due to failure of drbddisk::

drbd_backend
ResourceManager[5073]:  2008/02/28_10:38:32 info: Releasing resource
group: xen-a1.fra1.mailcluste

r drbddisk::drbd_backend xen::backend-A1
ResourceManager[5073]:  2008/02/28_10:38:32 info: Running
/etc/ha.d/resource.d/xen backend-A1 stop
ResourceManager[5073]:  2008/02/28_10:38:33 info: Running
/etc/ha.d/resource.d/drbddisk drbd_backe

          nd stop
mach_down[5019]:        2008/02/28_10:38:33 info:
/usr/share/heartbeat/mach_down: nice_failback: f

                  oreign resources acquired
mach_down[5019]:        2008/02/28_10:38:33 info: mach_down takeover
complete for node xen-a1.fra1

.mailcluster.
heartbeat[4960]: 2008/02/28_10:38:33 info: mach_down takeover complete.
heartbeat[4960]: 2008/02/28_10:38:33 info: Initial resource
acquisition complete (mach_down)
harc[5232]:     2008/02/28_10:38:33 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[5232]:  2008/02/28_10:38:33 received ip-request-resp
drbddisk::drbd_backend_2 OK y

es
ResourceManager[5253]:  2008/02/28_10:38:33 info: Acquiring resource
group: xen-b1.fra1.mailcluste

r drbddisk::drbd_backend_2 xen::backend-B1
ResourceManager[5253]:  2008/02/28_10:38:33 info: Running
/etc/ha.d/resource.d/drbddisk drbd_backe

          nd_2 start
ResourceManager[5253]:  2008/02/28_10:38:33 info: Running
/etc/ha.d/resource.d/xen backend-B1 star

          t
hb_standby[5588]:       2008/02/28_10:39:03 Going standby [foreign].
heartbeat[4960]: 2008/02/28_10:39:03 info: xen-b1.fra1.mailcluster
wants to go standby [foreign]
heartbeat[4960]: 2008/02/28_10:39:13 WARN: No reply to standby
request.  Standby request cancelled

but after a reboot some minutes before i had the logfile flooding with
this message

heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1]
ucast packet: No such device
heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast
eth0.: No such device
heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1]
ucast packet: No such device
heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast
eth0.: No such device

I stopped iptables, but it didnt go away, only after a new reboot,
what the reason for this
error?

in ha.cf should be both nodes have a "ucast eth0 172.20.2.1" entry?

thx

On Thu, Feb 28, 2008 at 11:18 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> Hi,
>
>
>  On Thu, Feb 28, 2008 at 08:36:33AM +0100, rupert wrote:
>  > has no one some ideas to this matter?
>
>  This is a drbd related issue. You should be better off in a drbd
>  forum.
>
>  Thanks,
>
>  Dejan
>
>
>
>  > thx
>  >
>  > On Tue, Feb 26, 2008 at 12:10 PM, rupert <[EMAIL PROTECTED]> wrote:
>  > > Hello,
>  > >
>  > >  i set up a cluster with 2 drbdb devices and 2 VM on each server.
>  > >  When one server goes down the other should take over the part of down 
> one.
>  > >  The drbd goes like this:
>  > >  a -> a
>  > >  b <- b
>  > >
>  > >  the other machine are not drbdb devices, just some loopback VM which
>  > >  caryy no data,
>  > >  can they be in the config for heartbeat?
>  > >
>  > >  in my haresources I have the following entries on both servers
>  > >
>  > >  xen-A1.fra1.mailcluster drbddisk::drbd_backend xen::backend-A1 
> xen::MX1-A1
>  > >  xen-B1.fra1.mailcluster drbddisk::drbd_backend_2 xen::backend-B1 
> xen::MX2-B1
>  > >
>  > >  in ha.cf on the first server I set ucast to
>  > >  ucast eth0 172.20.1.1
>  > >  and
>  > >  ucast eth0 172.20.2.1
>  > >  on the second server
>  > >
>  > >  when i restart the ha deamon it powers down all the VMs and makes on
>  > >  the first server
>  > >  all the drbd device primary but they should be on the first server
>  > >
>  > >  GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by
>  > >  [EMAIL PROTECTED], 2008-02-13 19:17:43
>  > >   0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>  > >     ns:135995280 nr:0 dw:779680 dr:135790386 al:224 bm:8602 lo:0 pe:0 
> ua:0 ap:0
>  > >         resync: used:0/31 hits:8442668 misses:8308 starving:0 dirty:0
>  > >  changed:8308
>  > >         act_log: used:0/257 hits:136296 misses:224 starving:0 dirty:0
>  > >  changed:224
>  > >   1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
>  > >     ns:0 nr:663968 dw:663968 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
>  > >         resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
>  > >
>  > >
>  > >  on my first start heartbeat told me that the drbddisk is active and it
>  > >  shouldnt be,
>  > >  but its the one that is on each server the main drbdisk, the other is
>  > >  the backup
>  > >  for failouts.
>  > >
>  > >  Resource drbddisk::drbd_backend_2 is active, and s
>  > >
>  > >                                       hould not be!
>  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources can affect data 
> integrity!
>  > >  2008/02/26_07:42:58 info: If you don't know what this means, then get 
> help!
>  > >  2008/02/26_07:42:58 info: Read the docs and/or source to
>  > >  /usr/share/heartbeat/Re
>  > >
>  > >            sourceManager for more details.
>  > >  CRITICAL: Resource drbddisk::drbd_backend_2 is active, and should not 
> be!
>  > >  CRITICAL: Non-idle resources can affect data integrity!
>  > >  info: If you don't know what this means, then get help!
>  > >  info: Read the docs and/or the source to
>  > >  /usr/share/heartbeat/ResourceManager fo
>  > >
>  > >                            r more details.
>  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources will affect resource 
> takeback!
>  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources may affect data 
> integrity!
>  > >
>  > >
>  > >  thx for your help
>  > >
>  > _______________________________________________
>  > Linux-HA mailing list
>  > [email protected]
>  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > See also: http://linux-ha.org/ReportingProblems
>
>  --
>  Dejan
>  _______________________________________________
>  Linux-HA mailing list
>  [email protected]
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Re: heartbeat shuts down all VM machines

Reply via email to