Re: [Linux-HA] Re: heartbeat shuts down all VM machines

rupert Fri, 29 Feb 2008 10:07:38 -0800

I did some google about the ucast errors, but not much info came arround.

What can be the cause of this? I rebooted and/or restarted the
machines but always on  both machines the log fills with the following



Feb 29 16:17:15 xen-B1 heartbeat: [2974]: ERROR: write failure on
ucast eth0.: No such device
Feb 29 16:17:17 xen-B1 heartbeat: [2974]: ERROR: glib: Unable to send
[-1] ucast packet: No such device
Feb 29 16:17:17 xen-B1 heartbeat: [2974]: ERROR: write failure on
ucast eth0.: No such device
Feb 29 16:17:19 xen-B1 heartbeat: [2974]: ERROR: glib: Unable to send
[-1] ucast packet: No such device
Feb 29 16:17:19 xen-B1 heartbeat: [2974]: ERROR: write failure on
ucast eth0.: No such device
--
Feb 29 16:18:39 xen-A1 heartbeat: [2936]: ERROR: glib: Unable to send
[-1] ucast packet: No such device
Feb 29 16:18:39 xen-A1 heartbeat: [2936]: ERROR: write failure on
ucast eth0.: No such device

--are these related?
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:149) Waiting for 2050.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:476)
hotplugStatusCallback
/local/domain/0/backend/vbd/2/2050/hotplug-status.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:490)
hotplugStatusCallback 1.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:143) Waiting for
devices irq.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:143) Waiting for
devices vkbd.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:143) Waiting for
devices vfb.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:143) Waiting for
devices pci.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:143) Waiting for
devices ioports.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:143) Waiting for
devices tap.
[2008-02-29 09:31:26 xend 3575] DEBUG (DevController:143) Waiting for
devices vtpm.


On both machines there are a couple of network services that run well
thorught eth0,
so ther device is up. Can this be because xen created some iptables rules?


thx for your help

Heiko

On Fri, Feb 29, 2008 at 9:33 AM, rupert <[EMAIL PROTECTED]> wrote:
> it works now much better, both systems did a reboot (dont know why),
>  and now both VM running on the first server, so how can i get the
>  second server to take back the 2nd VM?
>
>
>
>  On Thu, Feb 28, 2008 at 1:19 PM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
>  > Hi,
>  >
>  >
>  >
>  >  On Thu, Feb 28, 2008 at 12:11:31PM +0100, rupert wrote:
>  >  > mmh, i just restart the 2nd server to check in hearbeat moves the VM
>  >  > to the server1.
>  >  > I couldnt find any info about that in the logfiles on the first
>  >  > server, something like taking over backend-B1,
>  >  > and one VM did not start. But after the reboot of the server2 after
>  >  > some time it correctly starts the backend-B1
>  >  >
>  >  > heartbeat[4959]: 2008/02/28_10:36:19 WARN: Logging daemon is disabled
>  >  > --enabling logging daemon is
>  >  >
>  >  > recommended
>  >  > heartbeat[4959]: 2008/02/28_10:36:19 info: **************************
>  >  > heartbeat[4959]: 2008/02/28_10:36:19 info: Configuration validated.
>  >  > Starting heartbeat 2.1.2
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: heartbeat: version 2.1.2
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: Heartbeat generation: 
> 1202824451
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler:
>  >  > Added signal manual handler
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler:
>  >  > Added signal manual handler
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: Removing
>  >  > /var/run/heartbeat/rsctmp failed, recreating.
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: write socket
>  >  > priority set to IPTOS_LOWDELA
>  >  >
>  >  > Y on eth0
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound send
>  >  > socket to device: eth0
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound receive
>  >  > socket to device: eth0
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: started on
>  >  > port 694 interface eth0 to 172.
>  >  >
>  >  >  20.2.1
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_SignalHandler:
>  >  > Added signal handler for sign
>  >  >
>  >  > al 17
>  >  > heartbeat[4960]: 2008/02/28_10:36:19 info: Local status now set to: 'up'
>  >  > heartbeat[4960]: 2008/02/28_10:38:20 WARN: node 
> xen-a1.fra1.mailcluster: is dead
>  >  > heartbeat[4960]: 2008/02/28_10:38:20 info: Comm_now_up(): updating
>  >  > status to active
>  >  > heartbeat[4960]: 2008/02/28_10:38:20 info: Local status now set to: 
> 'active'
>  >  > heartbeat[4960]: 2008/02/28_10:38:20 WARN: No STONITH device configured.
>  >  > heartbeat[4960]: 2008/02/28_10:38:20 WARN: Shared disks are not 
> protected.
>  >  > heartbeat[4960]: 2008/02/28_10:38:20 info: Resources being acquired
>  >  > from xen-a1.fra1.mailcluster.
>  >  > harc[4989]:     2008/02/28_10:38:20 info: Running /etc/ha.d/rc.d/status 
> status
>  >  > heartbeat[4990]: 2008/02/28_10:38:20 info: Local Resource acquisition 
> completed.
>  >  > mach_down[5019]:        2008/02/28_10:38:20 info: Taking over resource
>  >  > group drbddisk::drbd_backen
>  >  >                                                                     d
>  >  > ResourceManager[5073]:  2008/02/28_10:38:20 info: Acquiring resource
>  >  > group: xen-a1.fra1.mailcluste
>  >  >
>  >  > r drbddisk::drbd_backend xen::backend-A1
>  >  > ResourceManager[5073]:  2008/02/28_10:38:20 info: Running
>  >  > /etc/ha.d/resource.d/drbddisk drbd_backe
>  >  >
>  >  >           nd start
>  >  > heartbeat[4960]: 2008/02/28_10:38:30 info: Local Resource acquisition
>  >  > completed. (none)
>  >  > heartbeat[4960]: 2008/02/28_10:38:30 info: local resource transition 
> completed.
>  >  > ResourceManager[5073]:  2008/02/28_10:38:32 ERROR: Return code 1 from
>  >  > /etc/ha.d/resource.d/drbddis
>  >  >                                                                      k
>  >  > ResourceManager[5073]:  2008/02/28_10:38:32 CRIT: Giving up resources
>  >  > due to failure of drbddisk::
>  >  >
>  >  > drbd_backend
>  >
>  >  You have to find out why is drbddisk failing.
>  >
>  >
>  >
>  >  > ResourceManager[5073]:  2008/02/28_10:38:32 info: Releasing resource
>  >  > group: xen-a1.fra1.mailcluste
>  >  >
>  >  > r drbddisk::drbd_backend xen::backend-A1
>  >  > ResourceManager[5073]:  2008/02/28_10:38:32 info: Running
>  >  > /etc/ha.d/resource.d/xen backend-A1 stop
>  >  > ResourceManager[5073]:  2008/02/28_10:38:33 info: Running
>  >  > /etc/ha.d/resource.d/drbddisk drbd_backe
>  >  >
>  >  >           nd stop
>  >  > mach_down[5019]:        2008/02/28_10:38:33 info:
>  >  > /usr/share/heartbeat/mach_down: nice_failback: f
>  >  >
>  >  >                   oreign resources acquired
>  >  > mach_down[5019]:        2008/02/28_10:38:33 info: mach_down takeover
>  >  > complete for node xen-a1.fra1
>  >  >
>  >  > .mailcluster.
>  >  > heartbeat[4960]: 2008/02/28_10:38:33 info: mach_down takeover complete.
>  >  > heartbeat[4960]: 2008/02/28_10:38:33 info: Initial resource
>  >  > acquisition complete (mach_down)
>  >  > harc[5232]:     2008/02/28_10:38:33 info: Running
>  >  > /etc/ha.d/rc.d/ip-request-resp ip-request-resp
>  >  > ip-request-resp[5232]:  2008/02/28_10:38:33 received ip-request-resp
>  >  > drbddisk::drbd_backend_2 OK y
>  >  >
>  >  > es
>  >  > ResourceManager[5253]:  2008/02/28_10:38:33 info: Acquiring resource
>  >  > group: xen-b1.fra1.mailcluste
>  >  >
>  >  > r drbddisk::drbd_backend_2 xen::backend-B1
>  >  > ResourceManager[5253]:  2008/02/28_10:38:33 info: Running
>  >  > /etc/ha.d/resource.d/drbddisk drbd_backe
>  >  >
>  >  >           nd_2 start
>  >  > ResourceManager[5253]:  2008/02/28_10:38:33 info: Running
>  >  > /etc/ha.d/resource.d/xen backend-B1 star
>  >  >
>  >  >           t
>  >  > hb_standby[5588]:       2008/02/28_10:39:03 Going standby [foreign].
>  >  > heartbeat[4960]: 2008/02/28_10:39:03 info: xen-b1.fra1.mailcluster
>  >  > wants to go standby [foreign]
>  >  > heartbeat[4960]: 2008/02/28_10:39:13 WARN: No reply to standby
>  >  > request.  Standby request cancelled
>  >  >
>  >  > but after a reboot some minutes before i had the logfile flooding with
>  >  > this message
>  >  >
>  >  > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1]
>  >  > ucast packet: No such device
>  >  > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast
>  >  > eth0.: No such device
>  >  > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1]
>  >  > ucast packet: No such device
>  >  > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast
>  >  > eth0.: No such device
>  >
>  >  Well, looks like eth0 doesn't exist.
>  >
>  >
>  >  > I stopped iptables, but it didnt go away, only after a new reboot,
>  >  > what the reason for this
>  >  > error?
>  >  >
>  >  > in ha.cf should be both nodes have a "ucast eth0 172.20.2.1" entry?
>  >
>  >  No. It should be ucast eth0 node2-ipaddress on node1 and vice
>  >  versa on node2. To simplify management, you can put both ucast
>  >  directives on both nodes. I believe that this is well documented
>  >  in ha.cf.
>  >
>  >  Thanks,
>  >
>  >  Dejan
>  >
>  >
>  >
>  >  > thx
>  >  >
>  >  > On Thu, Feb 28, 2008 at 11:18 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> 
> wrote:
>  >  > > Hi,
>  >  > >
>  >  > >
>  >  > >  On Thu, Feb 28, 2008 at 08:36:33AM +0100, rupert wrote:
>  >  > >  > has no one some ideas to this matter?
>  >  > >
>  >  > >  This is a drbd related issue. You should be better off in a drbd
>  >  > >  forum.
>  >  > >
>  >  > >  Thanks,
>  >  > >
>  >  > >  Dejan
>  >  > >
>  >  > >
>  >  > >
>  >  > >  > thx
>  >  > >  >
>  >  > >  > On Tue, Feb 26, 2008 at 12:10 PM, rupert <[EMAIL PROTECTED]> wrote:
>  >  > >  > > Hello,
>  >  > >  > >
>  >  > >  > >  i set up a cluster with 2 drbdb devices and 2 VM on each server.
>  >  > >  > >  When one server goes down the other should take over the part 
> of down one.
>  >  > >  > >  The drbd goes like this:
>  >  > >  > >  a -> a
>  >  > >  > >  b <- b
>  >  > >  > >
>  >  > >  > >  the other machine are not drbdb devices, just some loopback VM 
> which
>  >  > >  > >  caryy no data,
>  >  > >  > >  can they be in the config for heartbeat?
>  >  > >  > >
>  >  > >  > >  in my haresources I have the following entries on both servers
>  >  > >  > >
>  >  > >  > >  xen-A1.fra1.mailcluster drbddisk::drbd_backend xen::backend-A1 
> xen::MX1-A1
>  >  > >  > >  xen-B1.fra1.mailcluster drbddisk::drbd_backend_2 
> xen::backend-B1 xen::MX2-B1
>  >  > >  > >
>  >  > >  > >  in ha.cf on the first server I set ucast to
>  >  > >  > >  ucast eth0 172.20.1.1
>  >  > >  > >  and
>  >  > >  > >  ucast eth0 172.20.2.1
>  >  > >  > >  on the second server
>  >  > >  > >
>  >  > >  > >  when i restart the ha deamon it powers down all the VMs and 
> makes on
>  >  > >  > >  the first server
>  >  > >  > >  all the drbd device primary but they should be on the first 
> server
>  >  > >  > >
>  >  > >  > >  GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by
>  >  > >  > >  [EMAIL PROTECTED], 2008-02-13 19:17:43
>  >  > >  > >   0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C 
> r---
>  >  > >  > >     ns:135995280 nr:0 dw:779680 dr:135790386 al:224 bm:8602 lo:0 
> pe:0 ua:0 ap:0
>  >  > >  > >         resync: used:0/31 hits:8442668 misses:8308 starving:0 
> dirty:0
>  >  > >  > >  changed:8308
>  >  > >  > >         act_log: used:0/257 hits:136296 misses:224 starving:0 
> dirty:0
>  >  > >  > >  changed:224
>  >  > >  > >   1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C 
> r---
>  >  > >  > >     ns:0 nr:663968 dw:663968 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
>  >  > >  > >         resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 
> changed:0
>  >  > >  > >
>  >  > >  > >
>  >  > >  > >  on my first start heartbeat told me that the drbddisk is active 
> and it
>  >  > >  > >  shouldnt be,
>  >  > >  > >  but its the one that is on each server the main drbdisk, the 
> other is
>  >  > >  > >  the backup
>  >  > >  > >  for failouts.
>  >  > >  > >
>  >  > >  > >  Resource drbddisk::drbd_backend_2 is active, and s
>  >  > >  > >
>  >  > >  > >                                       hould not be!
>  >  > >  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources can affect 
> data integrity!
>  >  > >  > >  2008/02/26_07:42:58 info: If you don't know what this means, 
> then get help!
>  >  > >  > >  2008/02/26_07:42:58 info: Read the docs and/or source to
>  >  > >  > >  /usr/share/heartbeat/Re
>  >  > >  > >
>  >  > >  > >            sourceManager for more details.
>  >  > >  > >  CRITICAL: Resource drbddisk::drbd_backend_2 is active, and 
> should not be!
>  >  > >  > >  CRITICAL: Non-idle resources can affect data integrity!
>  >  > >  > >  info: If you don't know what this means, then get help!
>  >  > >  > >  info: Read the docs and/or the source to
>  >  > >  > >  /usr/share/heartbeat/ResourceManager fo
>  >  > >  > >
>  >  > >  > >                            r more details.
>  >  > >  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources will affect 
> resource takeback!
>  >  > >  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources may affect 
> data integrity!
>  >  > >  > >
>  >  > >  > >
>  >  > >  > >  thx for your help
>  >  > >  > >
>  >  > >  > _______________________________________________
>  >  > >  > Linux-HA mailing list
>  >  > >  > [email protected]
>  >  > >  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  >  > >  > See also: http://linux-ha.org/ReportingProblems
>  >  > >
>  >  > >  --
>  >  > >  Dejan
>  >  > >  _______________________________________________
>  >  > >  Linux-HA mailing list
>  >  > >  [email protected]
>  >  > >  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  >  > >  See also: http://linux-ha.org/ReportingProblems
>  >  > >
>  >  > _______________________________________________
>  >  > Linux-HA mailing list
>  >  > [email protected]
>  >  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  >  > See also: http://linux-ha.org/ReportingProblems
>  >  _______________________________________________
>  >  Linux-HA mailing list
>  >  [email protected]
>  >  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  >  See also: http://linux-ha.org/ReportingProblems
>  >
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Re: heartbeat shuts down all VM machines

Reply via email to