I don't know if those will solve the problem (I doubt they will), but if you
put the machine back into the traffic stream - try to get a few outputs if
things are going badly:
* stats output from haproxy (socket or web page, pref socket)
* netstat -antpoe output
* netstat -s output
* free -m output
* haproxy http logs
* iptables config output, if any
* be sure to have a tail -f /var/log/messages running before you start the
test to watch for conntrack and other messages
That will provide clues to what may be the problem(s).
Others will probably have ideas of other things to look for/capture while
trying the configuration.
On 2/7/10 2:20 AM, Peter Griffin wrote:
Hi there,
Ok I disabled selinux and increased check inter to 30s. I enabled an
http check of an asphx file because ASP is critical to the operation of
the site. It was already there but I disabled it earlier because of the
problems we were having:
option httpchk HEAD /testip.ashx HTTP/1.1\r\nHost:\ www.oursite.com
<http://www.oursite.com>
With regards to free, I'm ashamed to say that yes I did go after the
first line.
It happens to people who claim to be very linux savvy, so don't worry about it.
I also did a yum upgrade but will postpone 1.4rc1 until I
see how this change responds. Will put the LB back online when the
traffic is not that heavy as I cannot risk another outage and hence my
job :)
Will post a reply tomorrow afternoon.
Thank you so much you've been great.
On 7 February 2010 02:06, Hank A. Paulson <[email protected]
<mailto:[email protected]>> wrote:
You have selinux on, so it may be unhappy with some part of haproxy
- the directory it uses, the socket listeners, etc. Turn it off (if
you can) until you get everything working ok. Turning it off
requires a reboot.
To see if it is on:
# sestatus
google for how to turn it off
I would back off the check inter to 30s or so and make it an http
check of a file that you know exists, if you can have any static
files on your servers. This will allow you to see that haproxy is
able to find that file, get a 200 response and verify that the
server is up.
Also, when you say "free mem going down to 45Mb" are you looking at
the first line of "free" or the second line? Ignore the first line,
it is designed to cause panic. eg:
$ free -m
total used free shared buffers
cached
Mem: 32244 32069 174 0 0
19578
-/+ buffers/cache: 12490 19753
Swap: 4095 0 4095
OMG, I only have 174MB of my 32GB of memory available!?!
- no, really 19.75 GB is still available.
On your haproxy config, if you log errors separately then you can
tail -f that error-only log and watch it as you start up haproxy.
And why not do http logging if you are doing http mode? Maybe I am
missing something.
I would back off the check inter to 30s or so and make it an http
check of a file that you know exists, if you can have any static
files on your servers. This will allow you to see that haproxy is
able to find that file, get a 200 response and verify that the
server is really is up and responding fully, not just opening a
socket. If you can switch to 1.4rc1 then you get alot more info
about the health check/health status on the stats page and you can
do set log-health-checks as an addition aid to troubleshooting.
global
log 127.0.0.1 local0
log 127.0.0.1 local1 notice
#log loghost local0 info
option log-separate-errors
maxconn 4096
chroot /var/lib/haproxy
user haproxy
group haproxy
daemon
# debug
#quiet
defaults
log global
mode http
# option httplog
option dontlognull
retries 3
option redispatch
maxconn 4096
contimeout 5s
clitimeout 30s
srvtimeout 30s
listen loadbalancer :80
mode http
balance roundrobin
option forwardfor except 10.0.1.50
option httpclose
option httplog
option httpchk HEAD /favicon.ico
cookie SERVERID insert indirect nocache
server WEB01 10.0.1.108:80 <http://10.0.1.108:80>
cookie A check inter 30s
server WEB05 10.0.1.109:80 <http://10.0.1.109:80>
cookie B check inter 30s
listen statistics 10.0.1.50:8080 <http://10.0.1.50:8080>
stats enable
stats auth stats:stats
stats uri /
[BTW, Did you do a yum upgrade - not yum update after your install
of F12?, "yum update" misses certain kinds of packaging changes,
"yum upgrade" covers all updates, even if the name of a package
changes - yum upgrade should be the default used in yum examples - I
ask because many people don't do this and there are many security
fixes and other package bug fixes that have been posted]
On 2/6/10 6:59 AM, Peter Griffin wrote:
Hi Will,
Yes X-Windows is installed, but the default init is runlevel 3 and I
have not started X for the past couple of days. The video card
is an
addon card so I rule out shared memory.
With regards to eth1 I ran iptraf and can see that there is no
traffic
on eth1 so I'd rule this out as well. I thought about listening for
stunnel requests on eth1 10.0.1.51 and connecting to haproxy on
10.0.1.50, but maybe this will cause more problems...
I had already ftp'd a file some 70MB to another machine on the
same Vlan
and I did not see any problems whatsoever. What I'm planning to
do now
is to setup the LB in another environment with another 2 Web
servers and
1 DB server and stress the hell out of it. Then I can also test the
network traffic using Iperf.
Will report back in a few days, thank you once more.
On 6 February 2010 14:29, Willy Tarreau <[email protected]
<mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote:
On Sat, Feb 06, 2010 at 01:16:00PM +0100, Peter Griffin wrote:
> Both http & https. Also both web servers started to take it in
turns to
> report as DOWN but more frequently the second one than the first.
>
> I ran ethtool eth0 and can verify that it's full-duplex 1Gbps:
OK.
> I'm attaching dmesg, I don't understand most of it.
well, it shows some video driver issues, which are unrelated
(did you
start a graphics environment on your LB ?). It seems it's
reserving
some memory (64 or 512MB, I don't understand well) for the
video. I
hope it's not a card with shared memory, as the higher the
resolution,
the lower the remaining memory bandwidth for normal work.
But I don't see any iptables related issue there, so that's
fine.
Stupid question, are you sure that your traffic passes via
eth0 (the
gig one) ? I'm asking, because eth1 is a cheap 100 Mbps
realtek 8139,
and if you got the routing wrong, it could explain a lot of
networking
issues !
> I'll try to send a file
> in both directions to saturate the link as you suggested.
OK.
When doing that, don't bench the disks, just the network.
For that,
create "sparse files", which are empty files for which the
kernel
produces zeroes on the fly, and send them files to /dev/null. Eg
with ftp :
machine1$ dd if=/dev/null bs=1M count=0 seek=1024 of=1g.bin
machine2$ ftp machine1
> recv 1g.bin /dev/null
Regards,
Willy