Re: [Linux-HA] New HA user keeps loosing connection

Dejan Muhamedagic Mon, 25 May 2009 14:03:37 -0700

Hi,

On Mon, May 25, 2009 at 04:37:32PM -0400, Tom Potwin wrote:
> Hi
> 
> You say "There's only the IP address resource". Should that be different?
> Did I set up haresources wrong?


No idea. That's the part which is up to you. Heartbeat's just
going to start/stop whatever resources you put in.

> The .100 local address is the shared one between the two HAProxy load
> balancers. I can ping all the address from all 4 nodes. When I check
> ifconfig, it shows exactly what is should be, even when I switch one LB node
> off, the takes over like it should. The reason I thought this must have
> something to do with heartbeat is because of the way it all works after I
> turn heartbeat off and on again.

So, heartbeat restart fixes the problem? That's strange. Apart
from the heartbeating, Heartbeat is just an init system. Once it
starts resources (whatever you put in haresources), the resources
are on their own. You can even test your setup by shutting down
Heartbeat everywhere and setting the alias IP address by hand.
That shouldn't make any difference whatsoever to that proxy
thing.

Thanks,

Dejan


> > 
> > Thanks for getting back to me. I am using haresources which looks like 
> > this for LB1 & LB2:
> >    lb1.tlthost.net 192.168.31.100
> 
> There's only the IP address resource.
> 
> > I think I'm using 2.1.3-2. It's the version for Ubuntu 8.04 that's 
> > listed. I can't seem to find how to check it on my server.
> > Here are the two ha.cf files:
> > ***************** lb1 ********************* debugfile 
> > /var/log/ha-debug logfile /var/log/ha-log
> > logfacility     local0
> > keepalive 2
> > deadtime 10
> > udpport 694
> > bcast  eth0
> > mcast  eth0 225.0.0.1 694 1 0
> > ucast  eth0 192.168.31.211
> > auto_failback on
> > udp     eth0
> > node    lb1.tlthost.net
> > node    lb2.tlthost.net
> > respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail 
> > gid=haclient uid=hacluster
> > 
> > ***************** lb2 ********************* debugfile 
> > /var/log/ha-debug logfile /var/log/ha-log
> > logfacility     local0
> > keepalive 2
> > deadtime 10
> > udpport 694
> > bcast  eth0
> > mcast  eth0 225.0.0.1 694 1 0
> > ucast  eth0 192.168.31.201
> > auto_failback on
> > udp     eth0
> > node    lb1.tlthost.net
> > node    lb2.tlthost.net
> > respawn hacluster /usr/lib/heartbeat/ipfail apiauth ipfail 
> > gid=haclient uid=hacluster
> > --------------------------------------------
> > I don't have apache running on the node that heartbeat/HAProxy is on, 
> > but I did check the syslog for anything out of place, and I couldn't 
> > find anything. The Apache log on the web server node actually 
> > shouldn't even see anything about HAProxy or heartbeat, that's why I 
> > know something is wrong when I see a apache log error "File does not 
> > exist:/var/www/apache2-default/haproxy". Since I use 
> > http://192.168.31.100/haproxy?stats to access the file, it shouldn't 
> > be looking on the web server for it. It's like whatever happens makes 
> > LB1 disappear. The funny thing is it all still works as it was 
> > designed to, even when this problem is happening. The only way to make 
> > the stats work again is stop and start heartbeat. I tried doing the 
> > same for HAProxy, but it did nothing. I didn't add any logs here, 
> > because they are kind of big, even if I only include the part when it 
> > starts till it fails. If you do want to see some, please let me know 
> > which ones, and can I attach them instead of paste them?
> 
> No idea how this HAproxy thing works, sorry. At any rate, it's not under
> control of Heartbeat. If the IP address (the only
> resource) is running where it should run (try ping and ifconfig), then
> you'll have to talk again to the other guys.
> 
> Thanks,
> 
> Dejan
> 
> > Thanks, Tom
> > 
> > 
> > -----Original Message-----
> > From: Dejan Muhamedagic [mailto:[email protected]]
> > Sent: Monday, May 25, 2009 12:21 PM
> > To: [email protected]; General Linux-HA mailing list
> > Subject: Re: [Linux-HA] New HA user keeps loosing connection
> > 
> > Hi,
> > 
> > On Sat, May 23, 2009 at 01:13:53PM -0400, Tom Potwin wrote:
> > > Hi
> > > 
> > > I hope I'm doing this correctly. I just joined this list after I 
> > > tried looking for help with the HAProxy people.
> > > 
> > > I'm using HAProxy and Heartbeat on two Ubuntu 8.04 servers. I have 
> > > two Xen nodes on each of my physical machines. One is the load 
> > > balance and Heartbeat (LB1), the other is the actual LAMP web server
> (WEB1).
> > > Testing HAProxy/Heartbeat setup seems that it's working fine, by 
> > > that I mean that shutting off one of the web servers, it switches to 
> > > the other one. My problem is I keep loosing access to the HAProxy 
> > > stats page. I know that isn't a huge problem, but I'm worried it 
> > > might be a sign of a bigger problem somewhere.
> > >
> > > The stats show up fine for about 15-20 minutes, then I get a apache 
> > > generic
> > > 404 error page. I also see: "File does not exist:
> > > /var/www/apache2-default/haproxy" show up in the apache error log as 
> > > soon as I loose it. If I go back to my LB1 node and restart 
> > > Heartbeat, it all comes back for another 15-20 minutes. There's 
> > > nothing in any of the logs that I can see, other than it stops logging
> when it happens.
> > > I use http://192.168.31.100/haproxy?stats to get to that stats page. 
> > > The .100 is the shared address between the the load balancers. If I 
> > > use 192.168.31.201, which is LB1, I get the browser's 404 notice. If 
> > > I use .100, it shows my apache generic 404 page. So somehow it stops 
> > > seeing LB1, and goes to port 80 on my web server on the WEB1 node.
> > > That's where I see the apache error saying it can't find the HAProxy 
> > > stats
> > page.
> > > 
> > > When I used the "tcpdump -q -i eth0 tcp port 80 and src host
> > 192.168.31.100"
> > > command, it showed me looking at the stats, and the test web page:
> > > tcpdump: verbose output suppressed, use -v or -vv for full protocol 
> > > decode listening on eth0, link-type EN10MB (Ethernet), capture size 
> > > 96 bytes
> > > 11:23:16.106664 IP 192.168.31.100.www > 192.168.30.64.2289: tcp 0
> > > 11:23:16.254209 IP 192.168.31.100.www > 192.168.30.64.2289: tcp 0
> > > 11:23:16.254409 IP 192.168.31.100.www > 192.168.30.64.2289: tcp 262
> > > 11:23:16.254501 IP 192.168.31.100.www > 192.168.30.64.2289: tcp 0
> > > 11:23:17.460534 IP 192.168.31.100.www > 192.168.30.64.2290: tcp 0
> > > 11:23:17.628385 IP 192.168.31.100.www > 192.168.30.64.2290: tcp 0 
> > > 11:23:17.628590 IP 192.168.31.100.www > 192.168.30.64.2290: tcp 2712
> > > 11:23:17.839448 IP 192.168.31.100.www > 192.168.30.64.2290: tcp 2712 
> > > 11:23:17.839460 IP 192.168.31.100.www > 192.168.30.64.2290: tcp 524
> > > 
> > > Once I couldn't see the stats page again, the output stopped 
> > > completely. I watched it on LB2 as well. It seems like it stops 
> > > listing to the .100 IP address. If I use "tcpdump -q -i eth0 tcp 
> > > port 80" I see LB1 checking web1 and web2, but nothing on the .100
> address.
> > > The HAProxy people said they thought it might be a Heartbeat 
> > > problem, because after they checked my HAProxy setup, they couldn't 
> > > find any problems there. Sorry for the long post, I'm just getting 
> > > desperate for
> > some help.
> > 
> > OK. Doubt that this is a heartbeat problem, because they typically get 
> > excercised immediately and not wait for 15 minutes to do so. Anyway, 
> > can't say more unless you provide the configuration and logs. Which 
> > heartbeat version do you use? What kind of configuration (haresources or
> v2/CRM)?
> > 
> > BTW, did you check the apache logs, i.e. is that file (a cgi script I 
> > guess) really missing or is there something else. Are all processes 
> > which are supposed to be running there?
> > 
> > Thanks,
> > 
> > Dejan
> > 
> > > Thanks, Tom
> > > 
> > > _______________________________________________
> > > Linux-HA mailing list
> > > [email protected]
> > > http://lists.linux-ha.org/mailman/listinfo/linux-ha
> > > See also: http://linux-ha.org/ReportingProblems
> > 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
> 
> _______________________________________________
> Linux-HA mailing list
> [email protected]
> http://lists.linux-ha.org/mailman/listinfo/linux-ha
> See also: http://linux-ha.org/ReportingProblems
_______________________________________________
Linux-HA mailing list
[email protected]
http://lists.linux-ha.org/mailman/listinfo/linux-ha
See also: http://linux-ha.org/ReportingProblems

Re: [Linux-HA] New HA user keeps loosing connection

Reply via email to