Hi,

On Thu, Dec 13, 2007 at 10:19:18AM -0600, Jeremy Alons wrote:
> Greetings,
> 
> My apologies for the lengthy first message to the list, but I'm at my wits
> end, and prefer to supply too much information instead of too little.
> Ha-debug is included as a link at the end of this message.
> 
> I've got a fresh pair for ubuntu boxes (7.10) I'm trying to get heartbeat up
> and running on.  
> 
> Both machines are identical, communication has been verified on eth0 and
> eth1, unicast traffic appears functional on eth1.
> 
> Some background info:
> node1: ldirector01.EQX eth0: 192.168.38.25/24 eth1: 192.168.43.25/24
> node2: ldirector02.EQX eth0: 192.168.38.26/24 eth1: 192.168.43.26/24
> VIP: 192.168.38.40/24
> 
> DNS entries return the IP address bound to eth0 for these hostnames.  I've
> attached configurations to the end of the message, along with logs from the
> primary node.
> 
> The problem is when I start heartbeat on either node the IP address defined
> in haresources isn't being bound to the system.  I'm assuming it's going to
> come up as eth0:0 (and subsequent definitions in haresources are going to
> increment the alias by 1), however it isn't playing nice.  I can manually
> bring up the IP address:
> 
> [EMAIL PROTECTED]:/var/log# ifconfig eth0:0 up 192.168.38.40 netmask
> 255.255.255.0
> SIOCSIFFLAGS: Cannot assign requested address

What is the exit code? If it's not zero, then you will have to

> (The SIOCSIFFLAGS error appears to be a bug in Ubuntu's ifup/ifdown script)

ask the Ubuntu to fix it.

> However when I do this (and have heartbeat started on both nodes) and I
> attempt to fail over to the secondary node (either with
> /etc/init.d/heartbeat stop or simulating a power failure) the IP address
> does not get bound to the second node.
> 
> To make things more confusing when I start heartbeat on the secondary node
> after manually binding the VIP up on the primary node heartbeat takes the
> VIP offline (ResourceManager appears to hate me, in ha-log, at
> 2007/12/13_10:04:57).
> 
> I'm looking for suggestions on where to go from here, and why
> ResourceManager apparently only wants to remove IPs and not add them when it
> starts.
> 
> 
> Ha.cf:
> Node1:
> [EMAIL PROTECTED]:/etc/ha.d# cat ha.cf | grep -v \#
> debugfile /var/log/ha-debug
> logfile    /var/log/ha-log
> logfacility    daemon
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport    694
> ucast eth1 192.168.43.26
> auto_failback on
> node    ldirector01.EQX
> node    ldirector02.EQX
> ping_group router_group 192.168.38.1
> respawn hacluster /usr/lib/heartbeat/ipfail
> debug 1
> 
> Node2:
> [EMAIL PROTECTED]:/etc/ha.d# cat ha.cf | grep -v \#
> debugfile /var/log/ha-debug
> logfile    /var/log/ha-log
> logfacility    daemon
> keepalive 2
> deadtime 30
> warntime 10
> initdead 120
> udpport    694
> ucast eth1 192.168.43.25
> auto_failback on
> node    ldirector01.EQX
> node    ldirector02.EQX
> ping_group router_group 192.168.38.1
> respawn hacluster /usr/lib/heartbeat/ipfail
> debug 1
> 
> Haresources has only a single definition, super simple while testing:
> node1: ldirector02.EQX IPaddr::192.168.38.40/24/eth0
> node2: ldirector01.EQX IPaddr::192.168.38.40/24/eth0

This is no good. Either you have one virtual IP or two. If the
former, then the first line's going to suffice, otherwise you
need two different addresses.

Thanks,

Dejan

> Authkeys are mode 600 on both, both using auth 3, both defined as an md5 on
> the same string.
> 
> Logs:
> node1's /var/log/ha-log:
> heartbeat[15380]: 2007/12/13_09:54:44 info: AUTH: i=1: key = 0x6d9a98,
> auth=0x2ae8dd26a470, authname=crc
> heartbeat[15380]: 2007/12/13_09:54:44 info: AUTH: i=2: key = 0x6da468,
> auth=0x2ae8dd46def0, authname=sha1
> heartbeat[15380]: 2007/12/13_09:54:44 info: AUTH: i=3: key = 0x6dae68,
> auth=0x2ae8dd66ee10, authname=md5
> heartbeat[15380]: 2007/12/13_09:54:44 WARN: Core dumps could be lost if
> multiple dumps occur.
> heartbeat[15380]: 2007/12/13_09:54:44 WARN: Consider setting non-default
> value in /proc/sys/kernel/core_pattern (or equivalent) for maximum
> supportability
> heartbeat[15380]: 2007/12/13_09:54:44 WARN: Consider setting
> /proc/sys/kernel/core_uses_pid (or equivalent) to 1 for maximum
> supportability
> heartbeat[15380]: 2007/12/13_09:54:44 info: Version 2 support: false
> heartbeat[15380]: 2007/12/13_09:54:44 WARN: Logging daemon is disabled
> --enabling logging daemon is recommended
> heartbeat[15380]: 2007/12/13_09:54:44 info: **************************
> heartbeat[15380]: 2007/12/13_09:54:44 info: Configuration validated.
> Starting heartbeat 2.1.2
> heartbeat[15381]: 2007/12/13_09:54:44 info: heartbeat: version 2.1.2
> heartbeat[15381]: 2007/12/13_09:54:44 info: Heartbeat generation: 1197490909
> heartbeat[15381]: 2007/12/13_09:54:44 info: G_main_add_TriggerHandler: Added
> signal manual handler
> heartbeat[15381]: 2007/12/13_09:54:44 info: G_main_add_TriggerHandler: Added
> signal manual handler
> heartbeat[15381]: 2007/12/13_09:54:44 info: Removing
> /var/run/heartbeat/rsctmp failed, recreating.
> heartbeat[15381]: 2007/12/13_09:54:44 info: glib: ucast: write socket
> priority set to IPTOS_LOWDELAY on eth1
> heartbeat[15381]: 2007/12/13_09:54:44 info: glib: ucast: bound send socket
> to device: eth1
> heartbeat[15381]: 2007/12/13_09:54:44 info: glib: ucast: bound receive
> socket to device: eth1
> heartbeat[15381]: 2007/12/13_09:54:44 info: glib: ucast: started on port 694
> interface eth1 to 192.168.43.26
> heartbeat[15381]: 2007/12/13_09:54:44 info: glib: ping group heartbeat
> started.
> heartbeat[15381]: 2007/12/13_09:54:44 info: G_main_add_SignalHandler: Added
> signal handler for signal 17
> heartbeat[15381]: 2007/12/13_09:54:44 info: Local status now set to: 'up'
> heartbeat[15381]: 2007/12/13_09:54:45 info: Link router_group:router_group
> up.
> heartbeat[15381]: 2007/12/13_09:54:45 info: Status update for node
> router_group: status ping
> 
> <start heartbeat on secondary node>
> 
> heartbeat[15381]: 2007/12/13_10:04:44 info: Daily informational memory
> statistics
> heartbeat[15381]: 2007/12/13_10:04:44 info: MSG stats: 101/680 ms age 0
> [pid15381/MST_CONTROL]
> heartbeat[15381]: 2007/12/13_10:04:44 info: cl_malloc stats: 3460/18414
> 383248/179790 [pid15381/MST_CONTROL]
> heartbeat[15381]: 2007/12/13_10:04:44 info: RealMalloc stats: 397472 total
> malloc bytes. pid [15381/MST_CONTROL]
> heartbeat[15381]: 2007/12/13_10:04:44 info: Current arena value: 0
> heartbeat[15381]: 2007/12/13_10:04:44 info: MSG stats: 0/2 ms age 479440
> [pid15385/HBFIFO]
> heartbeat[15381]: 2007/12/13_10:04:44 info: cl_malloc stats: 371/458
> 45524/21281 [pid15385/HBFIFO]
> heartbeat[15381]: 2007/12/13_10:04:44 info: RealMalloc stats: 48096 total
> malloc bytes. pid [15385/HBFIFO]
> heartbeat[15381]: 2007/12/13_10:04:44 info: Current arena value: 0
> heartbeat[15381]: 2007/12/13_10:04:44 info: MSG stats: 0/0 ms age
> 17234757580 [pid15386/HBWRITE]
> heartbeat[15381]: 2007/12/13_10:04:44 info: cl_malloc stats: 372/794
> 45808/21481 [pid15386/HBWRITE]
> heartbeat[15381]: 2007/12/13_10:04:44 info: RealMalloc stats: 54488 total
> malloc bytes. pid [15386/HBWRITE]
> heartbeat[15381]: 2007/12/13_10:04:44 info: Current arena value: 0
> heartbeat[15381]: 2007/12/13_10:04:44 info: MSG stats: 0/0 ms age
> 17234757580 [pid15387/HBREAD]
> heartbeat[15381]: 2007/12/13_10:04:44 info: cl_malloc stats: 372/433
> 37680/17448 [pid15387/HBREAD]
> heartbeat[15381]: 2007/12/13_10:04:44 info: RealMalloc stats: 37772 total
> malloc bytes. pid [15387/HBREAD]
> heartbeat[15381]: 2007/12/13_10:04:44 info: Current arena value: 0
> heartbeat[15381]: 2007/12/13_10:04:44 info: MSG stats: 0/649 ms age 1960
> [pid15388/HBWRITE]
> heartbeat[15381]: 2007/12/13_10:04:44 info: cl_malloc stats: 374/17080
> 45992/21609 [pid15388/HBWRITE]
> heartbeat[15381]: 2007/12/13_10:04:44 info: RealMalloc stats: 59820 total
> malloc bytes. pid [15388/HBWRITE]
> heartbeat[15381]: 2007/12/13_10:04:44 info: Current arena value: 0
> heartbeat[15381]: 2007/12/13_10:04:44 info: MSG stats: 0/306 ms age 1960
> [pid15389/HBREAD]
> heartbeat[15381]: 2007/12/13_10:04:44 info: cl_malloc stats: 375/6556
> 46084/21673 [pid15389/HBREAD]
> heartbeat[15381]: 2007/12/13_10:04:44 info: RealMalloc stats: 48220 total
> malloc bytes. pid [15389/HBREAD]
> heartbeat[15381]: 2007/12/13_10:04:44 info: Current arena value: 0
> heartbeat[15381]: 2007/12/13_10:04:44 info: These are nothing to worry
> about.
> heartbeat[15381]: 2007/12/13_10:04:55 info: Link ldirector02.eqx:eth1 up.
> heartbeat[15381]: 2007/12/13_10:04:55 info: Link ldirector02.eqx:eth1 up.
> heartbeat[15381]: 2007/12/13_10:04:55 info: Status update for node
> ldirector02.eqx: status init
> heartbeat[15381]: 2007/12/13_10:04:55 info: Status update for node
> ldirector02.eqx: status up
> harc[15463]:    2007/12/13_10:04:55 info: Running /etc/ha.d/rc.d/status
> status
> heartbeat[15381]: 2007/12/13_10:04:55 info: Exiting status process 15463
> returned rc 0.
> harc[15472]:    2007/12/13_10:04:55 info: Running /etc/ha.d/rc.d/status
> status
> heartbeat[15381]: 2007/12/13_10:04:55 info: Exiting status process 15472
> returned rc 0.
> heartbeat[15381]: 2007/12/13_10:04:56 info: Status update for node
> ldirector02.eqx: status active
> heartbeat[15381]: 2007/12/13_10:04:56 info: all clients are now paused
> heartbeat[15381]: 2007/12/13_10:04:56 info: AnnounceTakeover(local 1,
> foreign 1, reason 'T_RESOURCES(us)' (1))
> harc[15480]:    2007/12/13_10:04:56 info: Running /etc/ha.d/rc.d/status
> status
> heartbeat[15381]: 2007/12/13_10:04:56 info: Exiting status process 15480
> returned rc 0.
> heartbeat[15381]: 2007/12/13_10:04:57 info: other_holds_resources: 0
> heartbeat[15381]: 2007/12/13_10:04:57 info: remote resource transition
> completed.
> heartbeat[15381]: 2007/12/13_10:04:57 info: AnnounceTakeover(local 1,
> foreign 1, reason 'T_RESOURCES(us)' (1))
> heartbeat[15381]: 2007/12/13_10:04:57 info: ldirector01.eqx wants to go
> standby [foreign]
> heartbeat[15381]: 2007/12/13_10:04:57 info: i_hold_resources: 3
> heartbeat[15381]: 2007/12/13_10:04:57 info: New standby state: 1
> heartbeat[15381]: 2007/12/13_10:04:57 info: other_holds_resources: 0
> heartbeat[15381]: 2007/12/13_10:04:57 info: standby: ldirector02.eqx can
> take our foreign resources
> heartbeat[15381]: 2007/12/13_10:04:57 info: AnnounceTakeover(local 1,
> foreign 1, reason 'T_RESOURCES(us)' (1))
> heartbeat[15381]: 2007/12/13_10:04:57 info: New standby state: 1
> heartbeat[15488]: 2007/12/13_10:04:57 info: give up foreign HA resources
> (standby).
> heartbeat[15488]: 2007/12/13_10:04:57 info: go_standby: who: 1 resource set:
> foreign
> heartbeat[15488]: 2007/12/13_10:04:57 info: go_standby: (query/action):
> (otherkeys/givegroup)
> ResourceManager[15499]: 2007/12/13_10:04:57 info: Releasing resource group:
> ldirector02.eqx IPaddr::192.168.38.40/24/eth0
> ResourceManager[15499]: 2007/12/13_10:04:57 info: Running
> /etc/ha.d/resource.d/IPaddr 192.168.38.40/24/eth0 stop
> IPaddr[15533]:  2007/12/13_10:04:57 info: /sbin/route -n del -host
> 192.168.38.40
> IPaddr[15533]:  2007/12/13_10:04:57 info: /sbin/ifconfig eth0:0 down
> IPaddr[15533]:  2007/12/13_10:04:57 info: IP Address 192.168.38.40 released
> heartbeat[15488]: 2007/12/13_10:04:57 info: foreign HA resource release
> completed (standby).
> heartbeat[15488]: 2007/12/13_10:04:57 info: FIFO message [type
> ask_resources] written rc=51
> heartbeat[15381]: 2007/12/13_10:04:57 info: Local standby process completed
> [foreign].
> heartbeat[15381]: 2007/12/13_10:04:57 info: New standby state: 3
> heartbeat[15381]: 2007/12/13_10:04:57 info: Exiting go_standby process 15488
> returned rc 0.
> heartbeat[15381]: 2007/12/13_10:04:58 info: all clients are now resumed
> heartbeat[15381]: 2007/12/13_10:04:58 WARN: 1 lost packet(s) for
> [ldirector02.eqx] [12:14]
> heartbeat[15381]: 2007/12/13_10:04:58 info: remote resource transition
> completed.
> heartbeat[15381]: 2007/12/13_10:04:58 info: AnnounceTakeover(local 1,
> foreign 1, reason 'T_RESOURCES(us)' (1))
> heartbeat[15381]: 2007/12/13_10:04:58 info: other_holds_resources: 1
> heartbeat[15381]: 2007/12/13_10:04:58 info: No pkts missing from
> ldirector02.eqx!
> heartbeat[15381]: 2007/12/13_10:04:58 info: Other node completed standby
> takeover of foreign resources.
> heartbeat[15381]: 2007/12/13_10:04:58 info: AnnounceTakeover(local 1,
> foreign 1, reason 'T_RESOURCES(us)' (1))
> heartbeat[15381]: 2007/12/13_10:04:58 info: New standby state: 0
> heartbeat[15381]: 2007/12/13_10:04:58 info: other_holds_resources: 1
> 
> /varlog/ha-debug.log: http://jalons.net/ha-debug.log
> 
> -- 
> Jeremy Alons
> Systems Administrator
> 866 839 1100 ext 3286
> 773 435 3286 direct
> 773 435 3232 fax
> 
> thinkorswim,inc.
> 600 West Chicago Ave, Suite #100
> Chicago, IL 60610
> 
> Member FINRA/SIPC/NFA
> trademark, all rights reserved
> ------------------------------
> This e-mail is sent by a financial firm and contains information that may be
> privileged and confidential.
> 
> 
> 
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Reply via email to