Re: simple failover is failing

Willy Tarreau Sat, 15 Oct 2011 22:47:06 -0700

Hello Tim,

On Sat, Oct 15, 2011 at 06:08:55PM -0000, Tim Dunphy wrote:
> Hello again list,
> 
>  I have a little more info to add..
> 
>  I was able to start up both lb's in debug mode. And I found some interesting 
> info.. on lb1 (the functioning node) I see activity in the debug logs as I 
> access the sites. But in the debug logs of lb2 this is all I see:
> 
> 
> [root@VIRTCENT02:~] #haproxy -d -f /etc/haproxy/haproxy.cfg
> Available polling systems :
>      sepoll : pref=400,  test result OK
>        poll : pref=200,  test result OK
>      select : pref=150,  test result OK
>       epoll : disabled,  test result OK
> Total: 4 (3 usable), will use sepoll.
> Using sepoll() as the polling mechanism.
> 00000001:www.accept(0004)=0006 from [192.168.1.34:46634]
> 00000001:www.clireq[0006:ffff]: GET /admin?stats;csv HTTP/1.1
> 00000001:www.clihdr[0006:ffff]: TE: deflate,gzip;q=0.3
> 00000001:www.clihdr[0006:ffff]: Connection: TE, close
> 00000001:www.clihdr[0006:ffff]: Host: 192.168.1.200
> 00000001:www.clihdr[0006:ffff]: User-Agent: check_haproxy.pl
> 00000001:www.srvcls[0006:ffff]
> 00000001:www.clicls[0006:ffff]
> 00000001:www.closed[0006:ffff]
> 
> 
> What you see here is the nagios server checking for a CSV file to indicate 
> that the server is alive. And the nagios check is successful and reports the 
> site is alive. But the sites will not appear in any browser. 
> 
> If I fire up lb1 the sites start to work and I see this in the debug logs:
> 
> 
> [root@VIRTCENT01:~] #haproxy -f /etc/haproxy/haproxy.cfg -d
> Available polling systems :
>      sepoll : pref=400,  test result OK
>        poll : pref=200,  test result OK
>      select : pref=150,  test result OK
>       epoll : disabled,  test result OK
> Total: 4 (3 usable), will use sepoll.
> Using sepoll() as the polling mechanism.
> 00000000:www.accept(0004)=0006 from [71.187.226.165:1024]
> 00000000:www.clireq[0006:ffff]: GET /cake/ HTTP/1.1
> 00000000:www.clihdr[0006:ffff]: Host: stage.jokefire.com
> 00000000:www.clihdr[0006:ffff]: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac 
> OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
> 00000000:www.clihdr[0006:ffff]: Accept: 
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 00000000:www.clihdr[0006:ffff]: Accept-Language: en-us,en;q=0.5
> 00000000:www.clihdr[0006:ffff]: Accept-Encoding: gzip, deflate
> 00000000:www.clihdr[0006:ffff]: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
> 00000000:www.clihdr[0006:ffff]: Connection: keep-alive
> 00000000:www.clihdr[0006:ffff]: Cookie: CAKEPHP=l8ug7fl47khnhvhjmcgtc3kcu2; 
> SERVERID=B
> 00000000:www.clihdr[0006:ffff]: Cache-Control: max-age=0
> 00000000:app.srvrep[0006:0007]: HTTP/1.1 200 OK
> 00000000:app.srvhdr[0006:0007]: Date: Sat, 15 Oct 2011 18:06:20 GMT
> 00000000:app.srvhdr[0006:0007]: Server: Apache/2.2.20 (CentOS)
> 00000000:app.srvhdr[0006:0007]: X-Powered-By: PHP/5.3.6
> 00000000:app.srvhdr[0006:0007]: P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo 
> STP IND DEM"
> 00000000:app.srvhdr[0006:0007]: Content-Length: 4937
> 00000000:app.srvhdr[0006:0007]: Connection: close
> 00000000:app.srvhdr[0006:0007]: Content-Type: text/html; charset=UTF-8
> 00000000:app.srvcls[0006:0007]
> 00000000:app.clicls[0006:0007]
> 00000000:app.closed[0006:0007]
> 00000001:www.accept(0004)=0006 from [71.187.226.165:1025]
> 00000001:www.clireq[0006:ffff]: GET /cake/app/webroot/css/cake.generic.css 
> HTTP/1.1
> 00000001:www.clihdr[0006:ffff]: Host: stage.jokefire.com
> 00000001:www.clihdr[0006:ffff]: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac 
> OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
> 00000001:www.clihdr[0006:ffff]: Accept: text/css,*/*;q=0.1
> 00000001:www.clihdr[0006:ffff]: Accept-Language: en-us,en;q=0.5
> 00000001:www.clihdr[0006:ffff]: Accept-Encoding: gzip, deflate
> 00000001:www.clihdr[0006:ffff]: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
> 
> 
> Thanks once again for any insight you may have to share!


Well, this simply means that your second node never gets the traffic
for the VIP. Check the following things :
  - if your second node correctly holds the virtual IP address when
    it is alone ;
  - if your clients or the router between your clients and the LB has
    updated its ARP cache to point to node 2.

I suspect that at least one of these 2 points is wrong.

Regards,
Willy

Re: simple failover is failing

Reply via email to