it works now much better, both systems did a reboot (dont know why), and now both VM running on the first server, so how can i get the second server to take back the 2nd VM?
On Thu, Feb 28, 2008 at 1:19 PM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote: > Hi, > > > > On Thu, Feb 28, 2008 at 12:11:31PM +0100, rupert wrote: > > mmh, i just restart the 2nd server to check in hearbeat moves the VM > > to the server1. > > I couldnt find any info about that in the logfiles on the first > > server, something like taking over backend-B1, > > and one VM did not start. But after the reboot of the server2 after > > some time it correctly starts the backend-B1 > > > > heartbeat[4959]: 2008/02/28_10:36:19 WARN: Logging daemon is disabled > > --enabling logging daemon is > > > > recommended > > heartbeat[4959]: 2008/02/28_10:36:19 info: ************************** > > heartbeat[4959]: 2008/02/28_10:36:19 info: Configuration validated. > > Starting heartbeat 2.1.2 > > heartbeat[4960]: 2008/02/28_10:36:19 info: heartbeat: version 2.1.2 > > heartbeat[4960]: 2008/02/28_10:36:19 info: Heartbeat generation: 1202824451 > > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler: > > Added signal manual handler > > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler: > > Added signal manual handler > > heartbeat[4960]: 2008/02/28_10:36:19 info: Removing > > /var/run/heartbeat/rsctmp failed, recreating. > > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: write socket > > priority set to IPTOS_LOWDELA > > > > Y on eth0 > > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound send > > socket to device: eth0 > > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound receive > > socket to device: eth0 > > heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: started on > > port 694 interface eth0 to 172. > > > > 20.2.1 > > heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_SignalHandler: > > Added signal handler for sign > > > > al 17 > > heartbeat[4960]: 2008/02/28_10:36:19 info: Local status now set to: 'up' > > heartbeat[4960]: 2008/02/28_10:38:20 WARN: node xen-a1.fra1.mailcluster: > is dead > > heartbeat[4960]: 2008/02/28_10:38:20 info: Comm_now_up(): updating > > status to active > > heartbeat[4960]: 2008/02/28_10:38:20 info: Local status now set to: > 'active' > > heartbeat[4960]: 2008/02/28_10:38:20 WARN: No STONITH device configured. > > heartbeat[4960]: 2008/02/28_10:38:20 WARN: Shared disks are not protected. > > heartbeat[4960]: 2008/02/28_10:38:20 info: Resources being acquired > > from xen-a1.fra1.mailcluster. > > harc[4989]: 2008/02/28_10:38:20 info: Running /etc/ha.d/rc.d/status > status > > heartbeat[4990]: 2008/02/28_10:38:20 info: Local Resource acquisition > completed. > > mach_down[5019]: 2008/02/28_10:38:20 info: Taking over resource > > group drbddisk::drbd_backen > > d > > ResourceManager[5073]: 2008/02/28_10:38:20 info: Acquiring resource > > group: xen-a1.fra1.mailcluste > > > > r drbddisk::drbd_backend xen::backend-A1 > > ResourceManager[5073]: 2008/02/28_10:38:20 info: Running > > /etc/ha.d/resource.d/drbddisk drbd_backe > > > > nd start > > heartbeat[4960]: 2008/02/28_10:38:30 info: Local Resource acquisition > > completed. (none) > > heartbeat[4960]: 2008/02/28_10:38:30 info: local resource transition > completed. > > ResourceManager[5073]: 2008/02/28_10:38:32 ERROR: Return code 1 from > > /etc/ha.d/resource.d/drbddis > > k > > ResourceManager[5073]: 2008/02/28_10:38:32 CRIT: Giving up resources > > due to failure of drbddisk:: > > > > drbd_backend > > You have to find out why is drbddisk failing. > > > > > ResourceManager[5073]: 2008/02/28_10:38:32 info: Releasing resource > > group: xen-a1.fra1.mailcluste > > > > r drbddisk::drbd_backend xen::backend-A1 > > ResourceManager[5073]: 2008/02/28_10:38:32 info: Running > > /etc/ha.d/resource.d/xen backend-A1 stop > > ResourceManager[5073]: 2008/02/28_10:38:33 info: Running > > /etc/ha.d/resource.d/drbddisk drbd_backe > > > > nd stop > > mach_down[5019]: 2008/02/28_10:38:33 info: > > /usr/share/heartbeat/mach_down: nice_failback: f > > > > oreign resources acquired > > mach_down[5019]: 2008/02/28_10:38:33 info: mach_down takeover > > complete for node xen-a1.fra1 > > > > .mailcluster. > > heartbeat[4960]: 2008/02/28_10:38:33 info: mach_down takeover complete. > > heartbeat[4960]: 2008/02/28_10:38:33 info: Initial resource > > acquisition complete (mach_down) > > harc[5232]: 2008/02/28_10:38:33 info: Running > > /etc/ha.d/rc.d/ip-request-resp ip-request-resp > > ip-request-resp[5232]: 2008/02/28_10:38:33 received ip-request-resp > > drbddisk::drbd_backend_2 OK y > > > > es > > ResourceManager[5253]: 2008/02/28_10:38:33 info: Acquiring resource > > group: xen-b1.fra1.mailcluste > > > > r drbddisk::drbd_backend_2 xen::backend-B1 > > ResourceManager[5253]: 2008/02/28_10:38:33 info: Running > > /etc/ha.d/resource.d/drbddisk drbd_backe > > > > nd_2 start > > ResourceManager[5253]: 2008/02/28_10:38:33 info: Running > > /etc/ha.d/resource.d/xen backend-B1 star > > > > t > > hb_standby[5588]: 2008/02/28_10:39:03 Going standby [foreign]. > > heartbeat[4960]: 2008/02/28_10:39:03 info: xen-b1.fra1.mailcluster > > wants to go standby [foreign] > > heartbeat[4960]: 2008/02/28_10:39:13 WARN: No reply to standby > > request. Standby request cancelled > > > > but after a reboot some minutes before i had the logfile flooding with > > this message > > > > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1] > > ucast packet: No such device > > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast > > eth0.: No such device > > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1] > > ucast packet: No such device > > heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast > > eth0.: No such device > > Well, looks like eth0 doesn't exist. > > > > I stopped iptables, but it didnt go away, only after a new reboot, > > what the reason for this > > error? > > > > in ha.cf should be both nodes have a "ucast eth0 172.20.2.1" entry? > > No. It should be ucast eth0 node2-ipaddress on node1 and vice > versa on node2. To simplify management, you can put both ucast > directives on both nodes. I believe that this is well documented > in ha.cf. > > Thanks, > > Dejan > > > > > thx > > > > On Thu, Feb 28, 2008 at 11:18 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> > wrote: > > > Hi, > > > > > > > > > On Thu, Feb 28, 2008 at 08:36:33AM +0100, rupert wrote: > > > > has no one some ideas to this matter? > > > > > > This is a drbd related issue. You should be better off in a drbd > > > forum. > > > > > > Thanks, > > > > > > Dejan > > > > > > > > > > > > > thx > > > > > > > > On Tue, Feb 26, 2008 at 12:10 PM, rupert <[EMAIL PROTECTED]> wrote: > > > > > Hello, > > > > > > > > > > i set up a cluster with 2 drbdb devices and 2 VM on each server. > > > > > When one server goes down the other should take over the part of > down one. > > > > > The drbd goes like this: > > > > > a -> a > > > > > b <- b > > > > > > > > > > the other machine are not drbdb devices, just some loopback VM > which > > > > > caryy no data, > > > > > can they be in the config for heartbeat? > > > > > > > > > > in my haresources I have the following entries on both servers > > > > > > > > > > xen-A1.fra1.mailcluster drbddisk::drbd_backend xen::backend-A1 > xen::MX1-A1 > > > > > xen-B1.fra1.mailcluster drbddisk::drbd_backend_2 xen::backend-B1 > xen::MX2-B1 > > > > > > > > > > in ha.cf on the first server I set ucast to > > > > > ucast eth0 172.20.1.1 > > > > > and > > > > > ucast eth0 172.20.2.1 > > > > > on the second server > > > > > > > > > > when i restart the ha deamon it powers down all the VMs and makes > on > > > > > the first server > > > > > all the drbd device primary but they should be on the first server > > > > > > > > > > GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by > > > > > [EMAIL PROTECTED], 2008-02-13 19:17:43 > > > > > 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r--- > > > > > ns:135995280 nr:0 dw:779680 dr:135790386 al:224 bm:8602 lo:0 > pe:0 ua:0 ap:0 > > > > > resync: used:0/31 hits:8442668 misses:8308 starving:0 > dirty:0 > > > > > changed:8308 > > > > > act_log: used:0/257 hits:136296 misses:224 starving:0 > dirty:0 > > > > > changed:224 > > > > > 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r--- > > > > > ns:0 nr:663968 dw:663968 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0 > > > > > resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 > changed:0 > > > > > > > > > > > > > > > on my first start heartbeat told me that the drbddisk is active > and it > > > > > shouldnt be, > > > > > but its the one that is on each server the main drbdisk, the other > is > > > > > the backup > > > > > for failouts. > > > > > > > > > > Resource drbddisk::drbd_backend_2 is active, and s > > > > > > > > > > hould not be! > > > > > 2008/02/26_07:42:58 CRITICAL: Non-idle resources can affect data > integrity! > > > > > 2008/02/26_07:42:58 info: If you don't know what this means, then > get help! > > > > > 2008/02/26_07:42:58 info: Read the docs and/or source to > > > > > /usr/share/heartbeat/Re > > > > > > > > > > sourceManager for more details. > > > > > CRITICAL: Resource drbddisk::drbd_backend_2 is active, and should > not be! > > > > > CRITICAL: Non-idle resources can affect data integrity! > > > > > info: If you don't know what this means, then get help! > > > > > info: Read the docs and/or the source to > > > > > /usr/share/heartbeat/ResourceManager fo > > > > > > > > > > r more details. > > > > > 2008/02/26_07:42:58 CRITICAL: Non-idle resources will affect > resource takeback! > > > > > 2008/02/26_07:42:58 CRITICAL: Non-idle resources may affect data > integrity! > > > > > > > > > > > > > > > thx for your help > > > > > > > > > _______________________________________________ > > > > Linux-HA mailing list > > > > [email protected] > > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > > See also: http://linux-ha.org/ReportingProblems > > > > > > -- > > > Dejan > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > > > _______________________________________________ > > Linux-HA mailing list > > [email protected] > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
