mmh, i just restart the 2nd server to check in hearbeat moves the VM
to the server1.
I couldnt find any info about that in the logfiles on the first
server, something like taking over backend-B1,
and one VM did not start. But after the reboot of the server2 after
some time it correctly starts the backend-B1
heartbeat[4959]: 2008/02/28_10:36:19 WARN: Logging daemon is disabled
--enabling logging daemon is
recommended
heartbeat[4959]: 2008/02/28_10:36:19 info: **************************
heartbeat[4959]: 2008/02/28_10:36:19 info: Configuration validated.
Starting heartbeat 2.1.2
heartbeat[4960]: 2008/02/28_10:36:19 info: heartbeat: version 2.1.2
heartbeat[4960]: 2008/02/28_10:36:19 info: Heartbeat generation: 1202824451
heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_TriggerHandler:
Added signal manual handler
heartbeat[4960]: 2008/02/28_10:36:19 info: Removing
/var/run/heartbeat/rsctmp failed, recreating.
heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: write socket
priority set to IPTOS_LOWDELA
Y on eth0
heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound send
socket to device: eth0
heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: bound receive
socket to device: eth0
heartbeat[4960]: 2008/02/28_10:36:19 info: glib: ucast: started on
port 694 interface eth0 to 172.
20.2.1
heartbeat[4960]: 2008/02/28_10:36:19 info: G_main_add_SignalHandler:
Added signal handler for sign
al 17
heartbeat[4960]: 2008/02/28_10:36:19 info: Local status now set to: 'up'
heartbeat[4960]: 2008/02/28_10:38:20 WARN: node xen-a1.fra1.mailcluster: is dead
heartbeat[4960]: 2008/02/28_10:38:20 info: Comm_now_up(): updating
status to active
heartbeat[4960]: 2008/02/28_10:38:20 info: Local status now set to: 'active'
heartbeat[4960]: 2008/02/28_10:38:20 WARN: No STONITH device configured.
heartbeat[4960]: 2008/02/28_10:38:20 WARN: Shared disks are not protected.
heartbeat[4960]: 2008/02/28_10:38:20 info: Resources being acquired
from xen-a1.fra1.mailcluster.
harc[4989]: 2008/02/28_10:38:20 info: Running /etc/ha.d/rc.d/status status
heartbeat[4990]: 2008/02/28_10:38:20 info: Local Resource acquisition completed.
mach_down[5019]: 2008/02/28_10:38:20 info: Taking over resource
group drbddisk::drbd_backen
d
ResourceManager[5073]: 2008/02/28_10:38:20 info: Acquiring resource
group: xen-a1.fra1.mailcluste
r drbddisk::drbd_backend xen::backend-A1
ResourceManager[5073]: 2008/02/28_10:38:20 info: Running
/etc/ha.d/resource.d/drbddisk drbd_backe
nd start
heartbeat[4960]: 2008/02/28_10:38:30 info: Local Resource acquisition
completed. (none)
heartbeat[4960]: 2008/02/28_10:38:30 info: local resource transition completed.
ResourceManager[5073]: 2008/02/28_10:38:32 ERROR: Return code 1 from
/etc/ha.d/resource.d/drbddis
k
ResourceManager[5073]: 2008/02/28_10:38:32 CRIT: Giving up resources
due to failure of drbddisk::
drbd_backend
ResourceManager[5073]: 2008/02/28_10:38:32 info: Releasing resource
group: xen-a1.fra1.mailcluste
r drbddisk::drbd_backend xen::backend-A1
ResourceManager[5073]: 2008/02/28_10:38:32 info: Running
/etc/ha.d/resource.d/xen backend-A1 stop
ResourceManager[5073]: 2008/02/28_10:38:33 info: Running
/etc/ha.d/resource.d/drbddisk drbd_backe
nd stop
mach_down[5019]: 2008/02/28_10:38:33 info:
/usr/share/heartbeat/mach_down: nice_failback: f
oreign resources acquired
mach_down[5019]: 2008/02/28_10:38:33 info: mach_down takeover
complete for node xen-a1.fra1
.mailcluster.
heartbeat[4960]: 2008/02/28_10:38:33 info: mach_down takeover complete.
heartbeat[4960]: 2008/02/28_10:38:33 info: Initial resource
acquisition complete (mach_down)
harc[5232]: 2008/02/28_10:38:33 info: Running
/etc/ha.d/rc.d/ip-request-resp ip-request-resp
ip-request-resp[5232]: 2008/02/28_10:38:33 received ip-request-resp
drbddisk::drbd_backend_2 OK y
es
ResourceManager[5253]: 2008/02/28_10:38:33 info: Acquiring resource
group: xen-b1.fra1.mailcluste
r drbddisk::drbd_backend_2 xen::backend-B1
ResourceManager[5253]: 2008/02/28_10:38:33 info: Running
/etc/ha.d/resource.d/drbddisk drbd_backe
nd_2 start
ResourceManager[5253]: 2008/02/28_10:38:33 info: Running
/etc/ha.d/resource.d/xen backend-B1 star
t
hb_standby[5588]: 2008/02/28_10:39:03 Going standby [foreign].
heartbeat[4960]: 2008/02/28_10:39:03 info: xen-b1.fra1.mailcluster
wants to go standby [foreign]
heartbeat[4960]: 2008/02/28_10:39:13 WARN: No reply to standby
request. Standby request cancelled
but after a reboot some minutes before i had the logfile flooding with
this message
heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1]
ucast packet: No such device
heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast
eth0.: No such device
heartbeat[2966]: 2008/02/28_10:15:34 ERROR: glib: Unable to send [-1]
ucast packet: No such device
heartbeat[2966]: 2008/02/28_10:15:34 ERROR: write failure on ucast
eth0.: No such device
I stopped iptables, but it didnt go away, only after a new reboot,
what the reason for this
error?
in ha.cf should be both nodes have a "ucast eth0 172.20.2.1" entry?
thx
On Thu, Feb 28, 2008 at 11:18 AM, Dejan Muhamedagic <[EMAIL PROTECTED]> wrote:
> Hi,
>
>
> On Thu, Feb 28, 2008 at 08:36:33AM +0100, rupert wrote:
> > has no one some ideas to this matter?
>
> This is a drbd related issue. You should be better off in a drbd
> forum.
>
> Thanks,
>
> Dejan
>
>
>
> > thx
> >
> > On Tue, Feb 26, 2008 at 12:10 PM, rupert <[EMAIL PROTECTED]> wrote:
> > > Hello,
> > >
> > > i set up a cluster with 2 drbdb devices and 2 VM on each server.
> > > When one server goes down the other should take over the part of down
> one.
> > > The drbd goes like this:
> > > a -> a
> > > b <- b
> > >
> > > the other machine are not drbdb devices, just some loopback VM which
> > > caryy no data,
> > > can they be in the config for heartbeat?
> > >
> > > in my haresources I have the following entries on both servers
> > >
> > > xen-A1.fra1.mailcluster drbddisk::drbd_backend xen::backend-A1
> xen::MX1-A1
> > > xen-B1.fra1.mailcluster drbddisk::drbd_backend_2 xen::backend-B1
> xen::MX2-B1
> > >
> > > in ha.cf on the first server I set ucast to
> > > ucast eth0 172.20.1.1
> > > and
> > > ucast eth0 172.20.2.1
> > > on the second server
> > >
> > > when i restart the ha deamon it powers down all the VMs and makes on
> > > the first server
> > > all the drbd device primary but they should be on the first server
> > >
> > > GIT-hash: b3fe2bdfd3b9f7c2f923186883eb9e2a0d3a5b1b build by
> > > [EMAIL PROTECTED], 2008-02-13 19:17:43
> > > 0: cs:Connected st:Primary/Secondary ds:UpToDate/UpToDate C r---
> > > ns:135995280 nr:0 dw:779680 dr:135790386 al:224 bm:8602 lo:0 pe:0
> ua:0 ap:0
> > > resync: used:0/31 hits:8442668 misses:8308 starving:0 dirty:0
> > > changed:8308
> > > act_log: used:0/257 hits:136296 misses:224 starving:0 dirty:0
> > > changed:224
> > > 1: cs:Connected st:Secondary/Primary ds:UpToDate/UpToDate C r---
> > > ns:0 nr:663968 dw:663968 dr:0 al:0 bm:0 lo:0 pe:0 ua:0 ap:0
> > > resync: used:0/31 hits:0 misses:0 starving:0 dirty:0 changed:0
> > >
> > >
> > > on my first start heartbeat told me that the drbddisk is active and it
> > > shouldnt be,
> > > but its the one that is on each server the main drbdisk, the other is
> > > the backup
> > > for failouts.
> > >
> > > Resource drbddisk::drbd_backend_2 is active, and s
> > >
> > > hould not be!
> > > 2008/02/26_07:42:58 CRITICAL: Non-idle resources can affect data
> integrity!
> > > 2008/02/26_07:42:58 info: If you don't know what this means, then get
> help!
> > > 2008/02/26_07:42:58 info: Read the docs and/or source to
> > > /usr/share/heartbeat/Re
> > >
> > > sourceManager for more details.
> > > CRITICAL: Resource drbddisk::drbd_backend_2 is active, and should not
> be!
> > > CRITICAL: Non-idle resources can affect data integrity!
> > > info: If you don't know what this means, then get help!
> > > info: Read the docs and/or the source to
> > > /usr/share/heartbeat/ResourceManager fo
> > >
> > > r more details.
> > > 2008/02/26_07:42:58 CRITICAL: Non-idle resources will affect resource
> takeback!
> > > 2008/02/26_07:42:58 CRITICAL: Non-idle resources may affect data
> integrity!
> > >
> > >
> > > thx for your help
> > >
> > _______________________________________________
> > Linux-HA mailing list
> > [email protected]
> > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > See also: http://linux-ha.org/ReportingProblems
>
> --
> Dejan
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
>
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems