Hello,
I would like to integrate Heartbeat and OpenSIPS to create High Availability
for my OpenSIPS proxy.
Here is my setup:
The OpenSIPS is loaded into both boxes and they work without any issue as an
individual box which I am going to use Box One and Box Two notation in order
to distinguish them here. Each OpenSIPS boxes also have their own IP
address XXX.XXX.XXX.XX7 and XXX.XXX.XXX.XX8 for accessing the box. The
third IP address is the floating IP address XXX.XXX.XXX.XX9 which is
obtained by the Master/Active Box while the Salve/Passive Box will be on hot
standby. The two boxes are connected to LinkSys Switch which provides the
required range of IP address and Internet connectivity. As shown here:
+----------------+
| Broad Band |
| Modem |
+---------------+
|
+------------------------+
| LinkSys Switch |
+------------------------+
| | |
| XX9 |
| |
+-------+ +-------+
| Box 1 | | Box 2 |
+-------+ +-------+
XX7 XX8
I have also added the OpenSIPS init file to the heartbeat resource folder
“/etc/ha.d/resources” in order to start and stop the OpenSIPS via heartbeat
fail-over process. Testing shows that the OpenSIPS init script it is
working.
The two boxes are also connected with null modem serial cable which is used
for the heartbeat signals.
The following is my heartbeat configuration files:
/etc/ha.d/ha.cf
# Heartbeat logging configuration
debugfile /var/log/ha-debug
logfile /var/log/ha-log
logfacility local0
# Heartbeat cluster members
node One
node Two
# Heartbeat communication timing
keepalive 2
deadtime 32
initdead 64
# Heartbeat communication paths
baud 19200
serial /dev/ttyS0
# Don't fail back automatically - on/off
auto_failback on
/etc/ha.d/haresources
One \
XXX.XXX.XXX.XX9/24/eth0 \
mysqld \
opensips
/etc/ha.d/authkeys
auth 1
1 sha1 HelloWorldPassword
Initial condition is follow:
Box One – Master / Active
Box Two – Slave / Passive (Hot Standby)
On the box one, once the Heartbeat is stopped, the Floating IP address is
released, MySQL Database is stopped, and finally the OpenSIPS is shutdown.
This event results the box two takes over the Floating IP address by
assigning it to the eth0 under aliases eth0:0, starts MySQL database, and
finally turn on the OpenSIPS. At this point everything appears to be in
perfect working order.
The problem is that the Switch continues to send the SIP packets to the box
one! At this time, even though the SIP traffic is sent to the box one, using
the Floating IP address, I can ssh to the box two which confirms the box two
has successfully obtained the Floating IP address. Somehow, it appears that
the only SIP traffic is continuously sent to the box one which is out of
commission due to heartbeat shutdown event! If I wait for few minutes then
SIP traffic eventually is sent to the box two and everything works as it is
supposed to do.
I have replaced the LinkSys EtherFast 4124 Switch with Nortel BayStack
70-27T and also SMC EZ Switch 108DT and the result is identical. The only
difference in the outcome of the switch change is the time needed to wait
for the SIP traffic sends to the right box (the box two). The waiting time
for Netgear is 120 Seconds while for LinkSys EF4124 is 300 seconds. “I
didn't bother to measure the waiting time for the Nortel switch but it also
appeared to be in order of few hundreds of seconds”.
The following shows the log for both box one and box two going through the
explained scenario.
The following is the log file on Box One:
[r...@one ~]# tail -f /var/log/ha-log
heartbeat[2111]: 2010/11/09_20:40:58 info: Current arena value: 0
heartbeat[2111]: 2010/11/09_20:40:58 info: MSG stats: 0/0 ms age 85675014
[pid2134/HBWRITE]
heartbeat[2111]: 2010/11/09_20:40:58 info: ha_malloc stats: 327/54952
32588/15273 [pid2134/HBWRITE]
heartbeat[2111]: 2010/11/09_20:40:58 info: RealMalloc stats: 41176 total
malloc bytes. pid [2134/HBWRITE]
heartbeat[2111]: 2010/11/09_20:40:58 info: Current arena value: 0
heartbeat[2111]: 2010/11/09_20:40:58 info: MSG stats: 0/0 ms age 85675014
[pid2135/HBREAD]
heartbeat[2111]: 2010/11/09_20:40:58 info: ha_malloc stats: 330/104085
32904/15465 [pid2135/HBREAD]
heartbeat[2111]: 2010/11/09_20:40:58 info: RealMalloc stats: 33764 total
malloc bytes. pid [2135/HBREAD]
heartbeat[2111]: 2010/11/09_20:40:58 info: Current arena value: 0
heartbeat[2111]: 2010/11/09_20:40:58 info: These are nothing to worry about.
************* At This Point the Heartbeat on Box is Stopped *************
heartbeat[2111]: 2010/11/09_22:58:16 info: Heartbeat shutdown in progress.
(2111)
heartbeat[6872]: 2010/11/09_22:58:16 info: Giving up all HA resources.
ResourceManager[6882]: 2010/11/09_22:58:16 info: Releasing resource group:
One XXX.XXX.XXX.XX9/24/eth0 mysqld opensips
ResourceManager[6882]: 2010/11/09_22:58:16 info: Running
/etc/ha.d/resource.d/opensips stop
ResourceManager[6882]: 2010/11/09_22:58:18 info: Running /etc/init.d/mysqld
stop
ResourceManager[6882]: 2010/11/09_22:58:21 info: Running
/etc/ha.d/resource.d/IPaddr XXX.XXX.XXX.XX9/24/eth0 stop
IPaddr[7029]: 2010/11/09_22:58:21 INFO: /sbin/ifconfig eth0:0
XXX.XXX.XXX.XX9 down
IPaddr[7008]: 2010/11/09_22:58:21 INFO: Success
heartbeat[6872]: 2010/11/09_22:58:21 info: All HA resources relinquished.
heartbeat[2111]: 2010/11/09_22:58:22 WARN: 1 lost packet(s) for [two]
[47444:47446]
heartbeat[2111]: 2010/11/09_22:58:22 info: No pkts missing from two!
heartbeat[2111]: 2010/11/09_22:58:23 info: killing HBWRITE process 2134 with
signal 15
heartbeat[2111]: 2010/11/09_22:58:23 info: killing HBREAD process 2135 with
signal 15
heartbeat[2111]: 2010/11/09_22:58:23 info: killing HBFIFO process 2133 with
signal 15
heartbeat[2111]: 2010/11/09_22:58:23 info: Core process 2135 exited. 3
remaining
heartbeat[2111]: 2010/11/09_22:58:23 info: Core process 2134 exited. 2
remaining
heartbeat[2111]: 2010/11/09_22:58:23 info: Core process 2133 exited. 1
remaining
heartbeat[2111]: 2010/11/09_22:58:23 info: One Heartbeat shutdown complete.
The following is the log file on Box Two:
[r...@two ~]# tail -f /var/log/ha-log
heartbeat[2116]: 2010/11/09_20:48:18 info: Current arena value: 0
heartbeat[2116]: 2010/11/09_20:48:18 info: MSG stats: 0/0 ms age 85496404
[pid2139/HBWRITE]
heartbeat[2116]: 2010/11/09_20:48:18 info: ha_malloc stats: 327/54952
32588/15273 [pid2139/HBWRITE]
heartbeat[2116]: 2010/11/09_20:48:18 info: RealMalloc stats: 41472 total
malloc bytes. pid [2139/HBWRITE]
heartbeat[2116]: 2010/11/09_20:48:18 info: Current arena value: 0
heartbeat[2116]: 2010/11/09_20:48:18 info: MSG stats: 0/0 ms age 85496404
[pid2140/HBREAD]
heartbeat[2116]: 2010/11/09_20:48:18 info: ha_malloc stats: 328/103833
32672/15317 [pid2140/HBREAD]
heartbeat[2116]: 2010/11/09_20:48:18 info: RealMalloc stats: 33084 total
malloc bytes. pid [2140/HBREAD]
heartbeat[2116]: 2010/11/09_20:48:18 info: Current arena value: 0
heartbeat[2116]: 2010/11/09_20:48:18 info: These are nothing to worry about.
************* At This Point the Heartbeat on Box is Stopped *************
heartbeat[2116]: 2010/11/09_23:09:09 info: Received shutdown notice from
'one'.
heartbeat[2116]: 2010/11/09_23:09:09 info: Resources being acquired from
one.
heartbeat[6886]: 2010/11/09_23:09:09 info: acquire local HA resources
(standby).
heartbeat[6886]: 2010/11/09_23:09:09 info: local HA resource acquisition
completed (standby).
heartbeat[2116]: 2010/11/09_23:09:09 info: Standby resource acquisition done
[foreign].
heartbeat[6887]: 2010/11/09_23:09:09 info: No local resources
[/usr/lib/heartbeat/ResourceManager listkeys two] to acquire.
harc[6906]: 2010/11/09_23:09:09 info: Running /etc/ha.d/rc.d/status
status
mach_down[6916]: 2010/11/09_23:09:09 info: Taking over resource group
XXX.XXX.XXX.XX9/24/eth0
ResourceManager[6936]: 2010/11/09_23:09:09 info: Acquiring resource group:
one XXX.XXX.XXX.XX9/24/eth0 mysqld opensips
IPaddr[6960]: 2010/11/09_23:09:09 INFO: Resource is stopped
ResourceManager[6936]: 2010/11/09_23:09:09 info: Running
/etc/ha.d/resource.d/IPaddr XXX.XXX.XXX.XX9/24/eth0 start
IPaddr[7038]: 2010/11/09_23:09:09 INFO: Using calculated netmask for
XXX.XXX.XXX.XX9: 255.255.255.0
IPaddr[7038]: 2010/11/09_23:09:09 DEBUG: Using calculated broadcast for
XXX.XXX.XXX.XX9: XXX.XXX.XXX.255
IPaddr[7038]: 2010/11/09_23:09:09 INFO: eval /sbin/ifconfig eth0:0
XXX.XXX.XXX.XX9 netmask 255.255.255.0 broadcast XXX.XXX.XXX.255
IPaddr[7038]: 2010/11/09_23:09:09 DEBUG: Sending Gratuitous Arp for
XXX.XXX.XXX.XX9 on eth0:0 [eth0]
IPaddr[7017]: 2010/11/09_23:09:10 INFO: Success
ResourceManager[6936]: 2010/11/09_23:09:10 info: Running /etc/init.d/mysqld
start
ResourceManager[6936]: 2010/11/09_23:09:11 info: Running
/etc/ha.d/resource.d/opensips start
mach_down[6916]: 2010/11/09_23:09:11 info:
/usr/lib/heartbeat/mach_down: nice_failback: foreign resources acquired
mach_down[6916]: 2010/11/09_23:09:11 info: mach_down takeover
complete for node one.
heartbeat[2116]: 2010/11/09_23:09:11 info: mach_down takeover complete.
heartbeat[2116]: 2010/11/09_23:09:42 WARN: node one: is dead
heartbeat[2116]: 2010/11/09_23:09:42 info: Dead node one gave up resources.
heartbeat[2116]: 2010/11/09_23:09:43 info: Link one:/dev/ttyS0 dead.
heartbeat[2139]: 2010/11/09_23:09:44 WARN: glib: TTY write timeout on
[/dev/ttyS0] (no connection or bad cable? [see documentation])
heartbeat[2139]: 2010/11/09_23:09:44 info: glib: See
http://linux-ha.org/FAQ#TTYtimeout for details
Could you please let me know what is going on and how I can get this issue
fixed as I am puzzled? I have used Heartbeat previously with no issue for
different scenarios than OpenSIPS and SIP traffic.
I have also came across the following links
http://horms.net/projects/has/html/node8.html IP Address Takeover - Linux HA
http://wiki.wireshark.org/Gratuitous_ARP Gratuitous ARP
But I am under impression that the Gratuitous ARP is sent by the Heartbeat
itself as I am seeing the ARP Cache is updated immediately after the
Fail-over once I run command "# arp -v -a" on both the boxes.
Thank you very much and have a great day.
Avestan :-)
--
View this message in context:
http://old.nabble.com/Issue-intergrating-Heartbeat-with-OpenSIPS-for-Linux-HA-tp30178202p30178202.html
Sent from the Linux-HA mailing list archive at Nabble.com.
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems