Re: [Linux-HA] Re: heartbeat shuts down all VM machines

rupert Fri, 29 Feb 2008 00:33:38 -0800

it works now much better, both systems did a reboot (dont know why),
and now both VM running on the first server, so how can i get the
second server to take back the 2nd VM?


On Thu, Feb 28, 2008 at 1:19 PM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> Hi,
>
>
>
>  On Thu, Feb 28, 2008 at 12:11:31PM +0100, rupert wrote:
>  > mmh, i just restart the 2nd server to check in hearbeat moves the VM
>  > to the server1.
>  > I couldnt find any info about that in the logfiles on the first
>  > server, something like taking over backend-B1,
>  > and one VM did not start. But after the reboot of the server2 after
>  > some time it correctly starts the backend-B1
>  >
>  > heartbeat[4959]: 2008/02/28_10:36:19 WARN: Logging daemon is disabled
>  > --enabling logging daemon is
>  >
>  > recommended
>  > heartbeat[4959]: 2008/02/28_10:36:19 info: **************************
>  > heartbeat[4959]: 2008/02/28_10:36:19 info: Configuration validated.
>  > Starting heartbeat 2.1.2
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: heartbeat: version 2.1.2
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: Heartbeat generation: 1202824451
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler:
>  > Added signal manual handler
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler:
>  > Added signal manual handler
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: Removing
>  > /var/run/heartbeat/rsctmp failed, recreating.
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: write socket
>  > priority set to IPTOS_LOWDELA
>  >
>  > Y on eth0
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound send
>  > socket to device: eth0
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound receive
>  > socket to device: eth0
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: started on
>  > port 694 interface eth0 to 172.
>  >
>  >  20.2.1
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_SignalHandler:
>  > Added signal handler for sign
>  >
>  > al 17
>  > heartbeat[4960]: 2008/02/28_10:36:19 info: Local status now set to: 'up'
>  > heartbeat[4960]: 2008/02/28_10:38:20 WARN: node xen-a1.fra1.mailcluster: 
> is dead
>  > heartbeat[4960]: 2008/02/28_10:38:20 info: Comm_now_up(): updating
>  > status to active
>  > heartbeat[4960]: 2008/02/28_10:38:20 info: Local status now set to: 
> 'active'
>  > heartbeat[4960]: 2008/02/28_10:38:20 WARN: No STONITH device configured.
>  > heartbeat[4960]: 2008/02/28_10:38:20 WARN: Shared disks are not protected.
>  > heartbeat[4960]: 2008/02/28_10:38:20 info: Resources being acquired
>  > from xen-a1.fra1.mailcluster.
>  > harc[4989]:     2008/02/28_10:38:20 info: Running /etc/ha.d/rc.d/status 
> status
>  > heartbeat[4990]: 2008/02/28_10:38:20 info: Local Resource acquisition 
> completed.
>  > mach_down[5019]:        2008/02/28_10:38:20 info: Taking over resource
>  > group drbddisk::drbd_backen
>  >                                                                     d
>  > ResourceManager[5073]:  2008/02/28_10:38:20 info: Acquiring resource
>  > group: xen-a1.fra1.mailcluste
>  >
>  > r drbddisk::drbd_backend xen::backend-A1
>  > ResourceManager[5073]:  2008/02/28_10:38:20 info: Running
>  > /etc/ha.d/resource.d/drbddisk drbd_backe
>  >
>  >           nd start
>  > heartbeat[4960]: 2008/02/28_10:38:30 info: Local Resource acquisition
>  > completed. (none)
>  > heartbeat[4960]: 2008/02/28_10:38:30 info: local resource transition 
> completed.
>  > ResourceManager[5073]:  2008/02/28_10:38:32 ERROR: Return code 1 from
>  > /etc/ha.d/resource.d/drbddis
>  >                                                                      k
>  > ResourceManager[5073]:  2008/02/28_10:38:32 CRIT: Giving up resources
>  > due to failure of drbddisk::
>  >
>  > drbd_backend
>
>  You have to find out why is drbddisk failing.
>
>
>
>  > ResourceManager[5073]:  2008/02/28_10:38:32 info: Releasing resource
>  > group: xen-a1.fra1.mailcluste
>  >
>  > r drbddisk::drbd_backend xen::backend-A1
>  > ResourceManager[5073]:  2008/02/28_10:38:32 info: Running
>  > /etc/ha.d/resource.d/xen backend-A1 stop
>  > ResourceManager[5073]:  2008/02/28_10:38:33 info: Running
>  > /etc/ha.d/resource.d/drbddisk drbd_backe
>  >
>  >           nd stop
>  > mach_down[5019]:        2008/02/28_10:38:33 info:
>  > /usr/share/heartbeat/mach_down: nice_failback: f
>  >
>  >                   oreign resources acquired
>  > mach_down[5019]:        2008/02/28_10:38:33 info: mach_down takeover
>  > complete for node xen-a1.fra1
>  >
>  > .mailcluster.
>  > heartbeat[4960]: 2008/02/28_10:38:33 info: mach_down takeover complete.
>  > heartbeat[4960]: 2008/02/28_10:38:33 info: Initial resource
>  > acquisition complete (mach_down)
>  > harc[5232]:     2008/02/28_10:38:33 info: Running
>  > /etc/ha.d/rc.d/ip-request-resp ip-request-resp
>  > ip-request-resp[5232]:  2008/02/28_10:38:33 received ip-request-resp
>  > drbddisk::drbd_backend_2 OK y
>  >
>  > es
>  > ResourceManager[5253]:  2008/02/28_10:38:33 info: Acquiring resource
>  > group: xen-b1.fra1.mailcluste
>  >
>  > r drbddisk::drbd_backend_2 xen::backend-B1
>  > ResourceManager[5253]:  2008/02/28_10:38:33 info: Running
>  > /etc/ha.d/resource.d/drbddisk drbd_backe
>  >
>  >           nd_2 start
>  > ResourceManager[5253]:  2008/02/28_10:38:33 info: Running
>  > /etc/ha.d/resource.d/xen backend-B1 star
>  >
>  >           t
>  > hb_standby[5588]:       2008/02/28_10:39:03 Going standby [foreign].
>  > heartbeat[4960]: 2008/02/28_10:39:03 info: xen-b1.fra1.mailcluster
>  > wants to go standby [foreign]
>  > heartbeat[4960]: 2008/02/28_10:39:13 WARN: No reply to standby
>  > request.  Standby request cancelled
>  >
>  > but after a reboot some minutes before i had the logfile flooding with
>  > this message
>  >
>  > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1]
>  > ucast packet: No such device
>  > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast
>  > eth0.: No such device
>  > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1]
>  > ucast packet: No such device
>  > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast
>  > eth0.: No such device
>
>  Well, looks like eth0 doesn't exist.
>
>
>  > I stopped iptables, but it didnt go away, only after a new reboot,
>  > what the reason for this
>  > error?
>  >
>  > in ha.cf should be both nodes have a "ucast eth0 172.20.2.1" entry?
>
>  No. It should be ucast eth0 node2-ipaddress on node1 and vice
>  versa on node2. To simplify management, you can put both ucast
>  directives on both nodes. I believe that this is well documented
>  in ha.cf.
>
>  Thanks,
>
>  Dejan
>
>
>
>  > thx
>  >
>  > On Thu, Feb 28, 2008 at 11:18 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> 
> wrote:
>  > > Hi,
>  > >
>  > >
>  > >  On Thu, Feb 28, 2008 at 08:36:33AM +0100, rupert wrote:
>  > >  > has no one some ideas to this matter?
>  > >
>  > >  This is a drbd related issue. You should be better off in a drbd
>  > >  forum.
>  > >
>  > >  Thanks,
>  > >
>  > >  Dejan
>  > >
>  > >
>  > >
>  > >  > thx
>  > >  >
>  > >  > On Tue, Feb 26, 2008 at 12:10 PM, rupert <[EMAIL PROTECTED]> wrote:
>  > >  > > Hello,
>  > >  > >
>  > >  > >  i set up a cluster with 2 drbdb devices and 2 VM on each server.
>  > >  > >  When one server goes down the other should take over the part of 
> down one.
>  > >  > >  The drbd goes like this:
>  > >  > >  a -> a
>  > >  > >  b <- b
>  > >  > >
>  > >  > >  the other machine are not drbdb devices, just some loopback VM 
> which
>  > >  > >  caryy no data,
>  > >  > >  can they be in the config for heartbeat?
>  > >  > >
>  > >  > >  in my haresources I have the following entries on both servers
>  > >  > >
>  > >  > >  xen-A1.fra1.mailcluster drbddisk::drbd_backend xen::backend-A1 
> xen::MX1-A1
>  > >  > >  xen-B1.fra1.mailcluster drbddisk::drbd_backend_2 xen::backend-B1 
> xen::MX2-B1
>  > >  > >
>  > >  > >  in ha.cf on the first server I set ucast to
>  > >  > >  ucast eth0 172.20.1.1
>  > >  > >  and
>  > >  > >  ucast eth0 172.20.2.1
>  > >  > >  on the second server
>  > >  > >
>  > >  > >  when i restart the ha deamon it powers down all the VMs and makes 
> on
>  > >  > >  the first server
>  > >  > >  all the drbd device primary but they should be on the first server
>  > >  > >
>  > >  > >  GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by
>  > >  > >  [EMAIL PROTECTED], 2008-02-13 19:17:43
>  > >  > >   0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
>  > >  > >     ns:135995280 nr:0 dw:779680 dr:135790386 al:224 bm:8602 lo:0 
> pe:0 ua:0 ap:0
>  > >  > >         resync: used:0/31 hits:8442668 misses:8308 starving:0 
> dirty:0
>  > >  > >  changed:8308
>  > >  > >         act_log: used:0/257 hits:136296 misses:224 starving:0 
> dirty:0
>  > >  > >  changed:224
>  > >  > >   1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
>  > >  > >     ns:0 nr:663968 dw:663968 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
>  > >  > >         resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 
> changed:0
>  > >  > >
>  > >  > >
>  > >  > >  on my first start heartbeat told me that the drbddisk is active 
> and it
>  > >  > >  shouldnt be,
>  > >  > >  but its the one that is on each server the main drbdisk, the other 
> is
>  > >  > >  the backup
>  > >  > >  for failouts.
>  > >  > >
>  > >  > >  Resource drbddisk::drbd_backend_2 is active, and s
>  > >  > >
>  > >  > >                                       hould not be!
>  > >  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources can affect data 
> integrity!
>  > >  > >  2008/02/26_07:42:58 info: If you don't know what this means, then 
> get help!
>  > >  > >  2008/02/26_07:42:58 info: Read the docs and/or source to
>  > >  > >  /usr/share/heartbeat/Re
>  > >  > >
>  > >  > >            sourceManager for more details.
>  > >  > >  CRITICAL: Resource drbddisk::drbd_backend_2 is active, and should 
> not be!
>  > >  > >  CRITICAL: Non-idle resources can affect data integrity!
>  > >  > >  info: If you don't know what this means, then get help!
>  > >  > >  info: Read the docs and/or the source to
>  > >  > >  /usr/share/heartbeat/ResourceManager fo
>  > >  > >
>  > >  > >                            r more details.
>  > >  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources will affect 
> resource takeback!
>  > >  > >  2008/02/26_07:42:58 CRITICAL: Non-idle resources may affect data 
> integrity!
>  > >  > >
>  > >  > >
>  > >  > >  thx for your help
>  > >  > >
>  > >  > _______________________________________________
>  > >  > Linux-HA mailing list
>  > >  > [email protected]
>  > >  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > >  > See also: http://linux-ha.org/ReportingProblems
>  > >
>  > >  --
>  > >  Dejan
>  > >  _______________________________________________
>  > >  Linux-HA mailing list
>  > >  [email protected]
>  > >  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > >  See also: http://linux-ha.org/ReportingProblems
>  > >
>  > _______________________________________________
>  > Linux-HA mailing list
>  > [email protected]
>  > http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  > See also: http://linux-ha.org/ReportingProblems
>  _______________________________________________
>  Linux-HA mailing list
>  [email protected]
>  http://lists.linux-ha.org/mailman/listinfo/linux-ha
>  See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] Re: heartbeat shuts down all VM machines

Reply via email to