Hello Tim,
On Sat, Oct 15, 2011 at 06:08:55PM -0000, Tim Dunphy wrote:
> Hello again list,
>
> I have a little more info to add..
>
> I was able to start up both lb's in debug mode. And I found some interesting
> info.. on lb1 (the functioning node) I see activity in the debug logs as I
> access the sites. But in the debug logs of lb2 this is all I see:
>
>
> [root@VIRTCENT02:~] #haproxy -d -f /etc/haproxy/haproxy.cfg
> Available polling systems :
> sepoll : pref=400, test result OK
> poll : pref=200, test result OK
> select : pref=150, test result OK
> epoll : disabled, test result OK
> Total: 4 (3 usable), will use sepoll.
> Using sepoll() as the polling mechanism.
> 00000001:www.accept(0004)=0006 from [192.168.1.34:46634]
> 00000001:www.clireq[0006:ffff]: GET /admin?stats;csv HTTP/1.1
> 00000001:www.clihdr[0006:ffff]: TE: deflate,gzip;q=0.3
> 00000001:www.clihdr[0006:ffff]: Connection: TE, close
> 00000001:www.clihdr[0006:ffff]: Host: 192.168.1.200
> 00000001:www.clihdr[0006:ffff]: User-Agent: check_haproxy.pl
> 00000001:www.srvcls[0006:ffff]
> 00000001:www.clicls[0006:ffff]
> 00000001:www.closed[0006:ffff]
>
>
> What you see here is the nagios server checking for a CSV file to indicate
> that the server is alive. And the nagios check is successful and reports the
> site is alive. But the sites will not appear in any browser.
>
> If I fire up lb1 the sites start to work and I see this in the debug logs:
>
>
> [root@VIRTCENT01:~] #haproxy -f /etc/haproxy/haproxy.cfg -d
> Available polling systems :
> sepoll : pref=400, test result OK
> poll : pref=200, test result OK
> select : pref=150, test result OK
> epoll : disabled, test result OK
> Total: 4 (3 usable), will use sepoll.
> Using sepoll() as the polling mechanism.
> 00000000:www.accept(0004)=0006 from [71.187.226.165:1024]
> 00000000:www.clireq[0006:ffff]: GET /cake/ HTTP/1.1
> 00000000:www.clihdr[0006:ffff]: Host: stage.jokefire.com
> 00000000:www.clihdr[0006:ffff]: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac
> OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
> 00000000:www.clihdr[0006:ffff]: Accept:
> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
> 00000000:www.clihdr[0006:ffff]: Accept-Language: en-us,en;q=0.5
> 00000000:www.clihdr[0006:ffff]: Accept-Encoding: gzip, deflate
> 00000000:www.clihdr[0006:ffff]: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
> 00000000:www.clihdr[0006:ffff]: Connection: keep-alive
> 00000000:www.clihdr[0006:ffff]: Cookie: CAKEPHP=l8ug7fl47khnhvhjmcgtc3kcu2;
> SERVERID=B
> 00000000:www.clihdr[0006:ffff]: Cache-Control: max-age=0
> 00000000:app.srvrep[0006:0007]: HTTP/1.1 200 OK
> 00000000:app.srvhdr[0006:0007]: Date: Sat, 15 Oct 2011 18:06:20 GMT
> 00000000:app.srvhdr[0006:0007]: Server: Apache/2.2.20 (CentOS)
> 00000000:app.srvhdr[0006:0007]: X-Powered-By: PHP/5.3.6
> 00000000:app.srvhdr[0006:0007]: P3P: CP="NOI ADM DEV PSAi COM NAV OUR OTRo
> STP IND DEM"
> 00000000:app.srvhdr[0006:0007]: Content-Length: 4937
> 00000000:app.srvhdr[0006:0007]: Connection: close
> 00000000:app.srvhdr[0006:0007]: Content-Type: text/html; charset=UTF-8
> 00000000:app.srvcls[0006:0007]
> 00000000:app.clicls[0006:0007]
> 00000000:app.closed[0006:0007]
> 00000001:www.accept(0004)=0006 from [71.187.226.165:1025]
> 00000001:www.clireq[0006:ffff]: GET /cake/app/webroot/css/cake.generic.css
> HTTP/1.1
> 00000001:www.clihdr[0006:ffff]: Host: stage.jokefire.com
> 00000001:www.clihdr[0006:ffff]: User-Agent: Mozilla/5.0 (Macintosh; Intel Mac
> OS X 10.6; rv:7.0.1) Gecko/20100101 Firefox/7.0.1
> 00000001:www.clihdr[0006:ffff]: Accept: text/css,*/*;q=0.1
> 00000001:www.clihdr[0006:ffff]: Accept-Language: en-us,en;q=0.5
> 00000001:www.clihdr[0006:ffff]: Accept-Encoding: gzip, deflate
> 00000001:www.clihdr[0006:ffff]: Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
>
>
> Thanks once again for any insight you may have to share!
Well, this simply means that your second node never gets the traffic
for the VIP. Check the following things :
- if your second node correctly holds the virtual IP address when
it is alone ;
- if your clients or the router between your clients and the LB has
updated its ARP cache to point to node 2.
I suspect that at least one of these 2 points is wrong.
Regards,
Willy