> So, heartbeat restart fixes the problem? That's strange. Apart from the heartbeating, Heartbeat is just an init system. > > Once it starts resources (whatever you put in haresources), the resources are on their own. You can even test your setup > by shutting down Heartbeat everywhere and setting the alias IP address by hand. > That shouldn't make any difference whatsoever to that proxy thing.
So if I understand this now, all heartbeat does is starts and stops whatever is in haresources? If that's the case, and all I have in it is just the one IP address, then this is starting to sound like a Xen virtual node problem. > > > > Thanks for getting back to me. I am using haresources which looks > > like this for LB1 & LB2: > > lb1.tlthost.net 192.168.31.100 > > There's only the IP address resource. > > > I think I'm using 2.1.3-2. It's the version for Ubuntu 8.04 that's > > listed. I can't seem to find how to check it on my server. > > Here are the two ha.cf files: > > ***************** lb1 ********************* debugfile > > /var/log/ha-debug logfile /var/log/ha-log > > logfacility local0 > > keepalive 2 > > deadtime 10 > > udpport 694 > > bcast eth0 > > mcast eth0 225.0.0.1 694 1 0 > > ucast eth0 192.168.31.211 > > auto_failback on > > udp eth0 > > node lb1.tlthost.net > > node lb2.tlthost.net > > respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail > > gid=haclient uid=hacluster > > > > ***************** lb2 ********************* debugfile > > /var/log/ha-debug logfile /var/log/ha-log > > logfacility local0 > > keepalive 2 > > deadtime 10 > > udpport 694 > > bcast eth0 > > mcast eth0 225.0.0.1 694 1 0 > > ucast eth0 192.168.31.201 > > auto_failback on > > udp eth0 > > node lb1.tlthost.net > > node lb2.tlthost.net > > respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail > > gid=haclient uid=hacluster > > -------------------------------------------- > > I don't have apache running on the node that heartbeat/HAProxy is > > on, but I did check the syslog for anything out of place, and I > > couldn't find anything. The Apache log on the web server node > > actually shouldn't even see anything about HAProxy or heartbeat, > > that's why I know something is wrong when I see a apache log error > > "File does not exist:/var/www/apache2-default/haproxy". Since I use > > http://192.168.31.100/haproxy?stats to access the file, it shouldn't > > be looking on the web server for it. It's like whatever happens > > makes > > LB1 disappear. The funny thing is it all still works as it was > > designed to, even when this problem is happening. The only way to > > make the stats work again is stop and start heartbeat. I tried doing > > the same for HAProxy, but it did nothing. I didn't add any logs > > here, because they are kind of big, even if I only include the part > > when it starts till it fails. If you do want to see some, please let > > me know which ones, and can I attach them instead of paste them? > > No idea how this HAproxy thing works, sorry. At any rate, it's not > under control of Heartbeat. If the IP address (the only > resource) is running where it should run (try ping and ifconfig), then > you'll have to talk again to the other guys. > > Thanks, > > Dejan > > > Thanks, Tom > > > > > > -----Original Message----- > > From: Dejan Muhamedagic [mailto:[email protected]] > > Sent: Monday, May 25, 2009 12:21 PM > > To: [email protected]; General Linux-HA mailing list > > Subject: Re: [Linux-HA] New HA user keeps loosing connection > > > > Hi, > > > > On Sat, May 23, 2009 at 01:13:53PM -0400, Tom Potwin wrote: > > > Hi > > > > > > I hope I'm doing this correctly. I just joined this list after I > > > tried looking for help with the HAProxy people. > > > > > > I'm using HAProxy and Heartbeat on two Ubuntu 8.04 servers. I have > > > two Xen nodes on each of my physical machines. One is the load > > > balance and Heartbeat (LB1), the other is the actual LAMP web > > > server > (WEB1). > > > Testing HAProxy/Heartbeat setup seems that it's working fine, by > > > that I mean that shutting off one of the web servers, it switches > > > to the other one. My problem is I keep loosing access to the > > > HAProxy stats page. I know that isn't a huge problem, but I'm > > > worried it might be a sign of a bigger problem somewhere. > > > > > > The stats show up fine for about 15-20 minutes, then I get a > > > apache generic > > > 404 error page. I also see: "File does not exist: > > > /var/www/apache2-default/haproxy" show up in the apache error log > > > as soon as I loose it. If I go back to my LB1 node and restart > > > Heartbeat, it all comes back for another 15-20 minutes. There's > > > nothing in any of the logs that I can see, other than it stops > > > logging > when it happens. > > > I use http://192.168.31.100/haproxy?stats to get to that stats page. > > > The .100 is the shared address between the the load balancers. If > > > I use 192.168.31.201, which is LB1, I get the browser's 404 > > > notice. If I use .100, it shows my apache generic 404 page. So > > > somehow it stops seeing LB1, and goes to port 80 on my web server on the WEB1 node. > > > That's where I see the apache error saying it can't find the > > > HAProxy stats > > page. > > > > > > When I used the "tcpdump -q -i eth0 tcp port 80 and src host > > 192.168.31.100" > > > command, it showed me looking at the stats, and the test web page: > > > tcpdump: verbose output suppressed, use -v or -vv for full > > > protocol decode listening on eth0, link-type EN10MB (Ethernet), > > > capture size > > > 96 bytes > > > 11:23:16.106664 IP 192.168.31.100.www > 192.168.30.64.2289: tcp 0 > > > 11:23:16.254209 IP 192.168.31.100.www > 192.168.30.64.2289: tcp 0 > > > 11:23:16.254409 IP 192.168.31.100.www > 192.168.30.64.2289: tcp > > > 262 > > > 11:23:16.254501 IP 192.168.31.100.www > 192.168.30.64.2289: tcp 0 > > > 11:23:17.460534 IP 192.168.31.100.www > 192.168.30.64.2290: tcp 0 > > > 11:23:17.628385 IP 192.168.31.100.www > 192.168.30.64.2290: tcp 0 > > > 11:23:17.628590 IP 192.168.31.100.www > 192.168.30.64.2290: tcp > > > 2712 > > > 11:23:17.839448 IP 192.168.31.100.www > 192.168.30.64.2290: tcp > > > 2712 11:23:17.839460 IP 192.168.31.100.www > 192.168.30.64.2290: > > > tcp 524 > > > > > > Once I couldn't see the stats page again, the output stopped > > > completely. I watched it on LB2 as well. It seems like it stops > > > listing to the .100 IP address. If I use "tcpdump -q -i eth0 tcp > > > port 80" I see LB1 checking web1 and web2, but nothing on the .100 > address. > > > The HAProxy people said they thought it might be a Heartbeat > > > problem, because after they checked my HAProxy setup, they > > > couldn't find any problems there. Sorry for the long post, I'm > > > just getting desperate for > > some help. > > > > OK. Doubt that this is a heartbeat problem, because they typically > > get excercised immediately and not wait for 15 minutes to do so. > > Anyway, can't say more unless you provide the configuration and > > logs. Which heartbeat version do you use? What kind of configuration > > (haresources or > v2/CRM)? > > > > BTW, did you check the apache logs, i.e. is that file (a cgi script > > I > > guess) really missing or is there something else. Are all processes > > which are supposed to be running there? > > > > Thanks, > > > > Dejan > > > > > Thanks, Tom > > > > > > _______________________________________________ > > > Linux-HA mailing list > > > [email protected] > > > http://lists.linux-ha.org/mailman/listinfo/linux-ha > > > See also: http://linux-ha.org/ReportingProblems > > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems > > _______________________________________________ > Linux-HA mailing list > [email protected] > http://lists.linux-ha.org/mailman/listinfo/linux-ha > See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems _______________________________________________ Linux-HA mailing list [email protected] http://lists.linux-ha.org/mailman/listinfo/linux-ha See also: http://linux-ha.org/ReportingProblems
