Hi guys,
I've got ldirectord set up inside heartbeat and it fails over perfectly.
I also have 3 VIPs with the group. I only use one of the VIPs for
ldirector.
ldirector.cf looks like this:
# Global Directives
checktimeout=2
checkinterval=2
logfile="local7"
# heartbeat.example.com
virtual=172.28.185.49:389
protocol=tcp
scheduler=lc
checktype=connect
checkport=389
#negotiatetimeout=10
real=172.28.185.37:389 masq
real=172.28.185.38:389 masq
service=ldap
protocol=tcp
checktimeout=10
checkinterval=10
Now when running on the primary node, connections come in and get
happily routed to .37 or .38. So far so good. If, however, I initiate a
failover I immediately have issues.
Initially ,wWhat I see take place on the failover is what I SHOULD see.
The resources fail over perfectly and ipvsadm shows the LVS routing
table like this:
[r...@lvsuat1b ha.d]# ipvsadm
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP ldpuat.vip.intranet.aeroplan lc
-> gasdyul9300602.intranet.aero Masq 1 0 0
-> gasdyul9300601.intranet.aero Masq 1 0 0
TCP esbuat1.vip.intranet.aeropla rr
Again, so far so good. But what I see hapenning is no connections are
able to get through. The InActConn begins to climb with every attempted
connection. Nothing gets through to the backend ldap servers. But after
about 20- 25 mins, connections slowly begin to return and things seem to
be normal.
Can someone explain to me what I'm missing here in my set up? Must be
something. I should note that immediately after the failover and during
the time I am having trouble, I am able to ssh to the VIP and correctly
land on the proper server, i.e. the backup node. So I think this kinda
rules out some weird Arp table issue on a switch somewhere.
Help me out here guys, where do I look?
Mike
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems