I don't know if those will solve the problem (I doubt they will), but if you put the machine back into the traffic stream - try to get a few outputs if things are going badly:

* stats output from haproxy (socket or web page, pref socket)
* netstat -antpoe output
* netstat -s output
* free -m output
* haproxy http logs
* iptables config output, if any
* be sure to have a tail -f /var/log/messages running before you start the test to watch for conntrack and other messages

That will provide clues to what may be the problem(s).
Others will probably have ideas of other things to look for/capture while trying the configuration.

On 2/7/10 2:20 AM, Peter Griffin wrote:
Hi there,
Ok I disabled selinux and increased check inter to 30s.  I enabled an
http check of an asphx file because ASP is critical to the operation of
the site.  It was already there but I disabled it earlier because of the
problems we were having:
option httpchk HEAD /testip.ashx HTTP/1.1\r\nHost:\ www.oursite.com
<http://www.oursite.com>

With regards to free, I'm ashamed to say that yes I did go after the
first line.

It happens to people who claim to be very linux savvy, so don't worry about it.

I also did a yum upgrade but will postpone 1.4rc1 until I
see how this change responds.  Will put the LB back online when the
traffic is not that heavy as I cannot risk another outage and hence my
job :)

Will post a reply tomorrow afternoon.

Thank you so much you've been great.





On 7 February 2010 02:06, Hank A. Paulson <[email protected]
<mailto:[email protected]>> wrote:

    You have selinux on, so it may be unhappy with some part of haproxy
    - the directory it uses, the socket listeners, etc. Turn it off (if
    you can) until you get everything working ok. Turning it off
    requires a reboot.

    To see if it is on:
    # sestatus
    google for how to turn it off

    I would back off the check inter to 30s or so and make it an http
    check of a file that you know exists, if you can have any static
    files on your servers. This will allow you to see that haproxy is
    able to find that file, get a 200 response and verify that the
    server is up.

    Also, when you say "free mem going down to 45Mb" are you looking at
    the first line of "free" or the second line? Ignore the first line,
    it is designed to cause panic. eg:

    $ free -m
                 total       used       free     shared    buffers
    cached
    Mem:         32244      32069        174          0          0
      19578
    -/+ buffers/cache:      12490      19753
    Swap:         4095          0       4095

    OMG, I only have 174MB of my 32GB of memory available!?!
    - no, really 19.75 GB is still available.

    On your haproxy config, if you log errors separately then you can
    tail -f that error-only log and watch it as you start up haproxy.
    And why not do http logging if you are doing http mode? Maybe I am
    missing something.

    I would back off the check inter to 30s or so and make it an http
    check of a file that you know exists, if you can have any static
    files on your servers. This will allow you to see that haproxy is
    able to find that file, get a 200 response and verify that the
    server is really is up and responding fully, not just opening a
    socket. If you can switch to 1.4rc1 then you get alot more info
    about the health check/health status on the stats page and you can
    do set log-health-checks as an addition aid to troubleshooting.


    global
            log 127.0.0.1   local0
            log 127.0.0.1   local1 notice
            #log loghost    local0 info
            option       log-separate-errors

            maxconn 4096
            chroot /var/lib/haproxy
            user haproxy
            group haproxy
            daemon
    #       debug
            #quiet

    defaults
            log     global
            mode    http
    #       option  httplog
            option  dontlognull
            retries 3
            option redispatch
            maxconn 4096
            contimeout      5s
            clitimeout      30s
            srvtimeout      30s


    listen loadbalancer :80
                    mode http
                    balance roundrobin
                    option forwardfor except 10.0.1.50
                    option httpclose
                    option httplog
                    option httpchk HEAD /favicon.ico

                    cookie SERVERID insert indirect nocache
                    server WEB01 10.0.1.108:80 <http://10.0.1.108:80>
    cookie A check inter 30s
                    server WEB05 10.0.1.109:80 <http://10.0.1.109:80>
    cookie B check inter 30s


    listen statistics 10.0.1.50:8080 <http://10.0.1.50:8080>
            stats enable
            stats auth stats:stats
            stats uri /

    [BTW, Did you do a yum upgrade - not yum update after your install
    of F12?, "yum update" misses certain kinds of packaging changes,
    "yum upgrade" covers all updates, even if the name of a package
    changes - yum upgrade should be the default used in yum examples - I
    ask because many people don't do this and there are many security
    fixes and other package bug fixes that have been posted]


    On 2/6/10 6:59 AM, Peter Griffin wrote:

        Hi Will,
        Yes X-Windows is installed, but the default init is runlevel 3 and I
        have not started X for the past couple of days.  The video card
        is an
        addon card so I rule out shared memory.

        With regards to eth1 I ran iptraf and can see that there is no
        traffic
        on eth1 so I'd rule this out as well.  I thought about listening for
        stunnel requests on eth1 10.0.1.51 and connecting to haproxy on
        10.0.1.50, but maybe this will cause more problems...
        I had already ftp'd a file some 70MB to another machine on the
        same Vlan
        and I did not see any problems whatsoever.  What I'm planning to
        do now
        is to setup the LB in another environment with another 2 Web
        servers and
        1 DB server and stress the hell out of it.  Then I can also test the
        network traffic using Iperf.
        Will report back in a few days, thank you once more.




        On 6 February 2010 14:29, Willy Tarreau <[email protected]
        <mailto:[email protected]> <mailto:[email protected] <mailto:[email protected]>>> wrote:

            On Sat, Feb 06, 2010 at 01:16:00PM +0100, Peter Griffin wrote:
         > Both http & https.  Also both web servers started to take it in
            turns to
         > report as DOWN but more frequently the second one than the first.
         >
         > I ran ethtool eth0 and can verify that it's full-duplex 1Gbps:

            OK.

         > I'm attaching dmesg, I don't understand most of it.

            well, it shows some video driver issues, which are unrelated
        (did you
            start a graphics environment on your LB ?). It seems it's
        reserving
            some memory (64 or 512MB, I don't understand well) for the
        video. I
            hope it's not a card with shared memory, as the higher the
        resolution,
            the lower the remaining memory bandwidth for normal work.

            But I don't see any iptables related issue there, so that's
        fine.

            Stupid question, are you sure that your traffic passes via
        eth0 (the
            gig one) ? I'm asking, because eth1 is a cheap 100 Mbps
        realtek 8139,
            and if you got the routing wrong, it could explain a lot of
        networking
            issues !

         > I'll try to send a file
         > in both directions to saturate the link as you suggested.

            OK.

            When doing that, don't bench the disks, just the network.
        For that,
            create "sparse files", which are empty files for which the
        kernel
            produces zeroes on the fly, and send them files to /dev/null. Eg
            with ftp :

            machine1$ dd if=/dev/null bs=1M count=0 seek=1024 of=1g.bin

            machine2$ ftp machine1
         > recv 1g.bin /dev/null


            Regards,
            Willy





Reply via email to