Some more details, let the production server suffer 2 more times to test a narrowed down config. The new config only worked as a rate limiter 1.5.dev haproxy instance, and had a running 1.3 instance in the background doing the real backend game.
So for the 1.5 rate limiter -still dieing- config was narrowed down to:

global
        log     127.0.0.1       daemon  debug
        maxconn 1024
        chroot /var/chroot/haproxy2
        uid 99
        gid 99
        daemon
        quiet
        pidfile /var/run/haproxy-private2.pid

defaults
        log     global
        mode    http
        #mode   tcp
        option  httplog
        option  dontlognull
        option redispatch
        retries 3
        maxconn 3000
        contimeout      4000
        clitimeout      1000
        srvtimeout      200000
        stats enable
        stats scope mySite-webfarm
        stats scope ease-up
        stats uri     /admin2?stats
        stats realm   Haproxy\ Statistics
        stats auth    user:pass

# only rate limiting in this 1.5.dev instance, forward everything else to 1.3.
listen mySite-webfarm 82.136.1.111:80
    option forwardfor
    option httpclose

    #let a stable 1.3 instance handle the real balancing etc
    server realhost 82.136.1.111:8011 check inter 20000 rise 2 fall 3

    contimeout  6000
    clitimeout  2000

    errorfile 503 /usr/local/etc/503error.html

    ### (d)dos protection ###
    stick-table type ip size 1m expire 10m store gpc0,conn_rate(10s)
    acl source_is_abuser   src_get_gpc0 gt 0
    tcp-request connection track-sc1 src if ! source_is_abuser
    acl conn_rate_abuse    sc1_conn_rate gt 40
    acl mark_as_abuser     sc1_inc_gpc0 gt 0
    use_backend ease-up    if source_is_abuser
    use_backend ease-up    if conn_rate_abuse mark_as_abuser

backend ease-up
    mode http
    errorfile 503 /usr/local/etc/503error_dos.html

And yeah, died with the same socks error message as yesterday.
(Server was hit by 30-40reqs/sec during this time, it died after ~30mins)

Hope it helps..let me know if you need any more input.
Thanks,
Joe

Idézet ("Jozsef R.Nagy" <[email protected]>):

On 2010. 09. 15. 15:08, Willy Tarreau wrote:
On Wed, Sep 15, 2010 at 01:00:57PM +0200, Jozsef R.Nagy wrote:

Have you found a minimal way to reproduce this ? Also did you have the
tcp-request rules enabled in the conf causing this issue ?



No minimal way yet, the config is the 'full' one i've set over
previously with 2 listens (and no frontend/backend blocks) with the mods
you've recommended:

OK, that's already useful information.


Is it? :)
So yea tcp-request rules were enabled.
Not sure how to reproduce it for getting to minimal way, as it only
happened 4 times on production setup, and can't really afford having it
dead a few more times atm :/

I certainly can understand and thank you for these tests. Now we're
certain there's a nasty bug, so you should stay on the safe side.


On test instance I can't get to reproducing it just yet..prolly not
enough traffic or concurrency simply?

that's very possible.
  Willy


3 hours, 40k randomized requests later test instance -with same config-
still stands.
Difference between test and live:
- Test is only hit by 2ips, thus the rate limiter tables are way much smaller
- The concurrency ratio is still lower (over 100 on production every
now and then)

Otherwise running test instance on same host, same binary, very same
config (except ports).

Hopefully this helps a bit to narrow down the possible causes..
Let me know if I can help in any way to track this down, in need of
rate limiting.

Thanks,
Joe




Reply via email to