Some more details, let the production server suffer 2 more times to
test a narrowed down config.
The new config only worked as a rate limiter 1.5.dev haproxy instance,
and had a running 1.3 instance in the background doing the real
backend game.
So for the 1.5 rate limiter -still dieing- config was narrowed down to:
global
log 127.0.0.1 daemon debug
maxconn 1024
chroot /var/chroot/haproxy2
uid 99
gid 99
daemon
quiet
pidfile /var/run/haproxy-private2.pid
defaults
log global
mode http
#mode tcp
option httplog
option dontlognull
option redispatch
retries 3
maxconn 3000
contimeout 4000
clitimeout 1000
srvtimeout 200000
stats enable
stats scope mySite-webfarm
stats scope ease-up
stats uri /admin2?stats
stats realm Haproxy\ Statistics
stats auth user:pass
# only rate limiting in this 1.5.dev instance, forward everything else to 1.3.
listen mySite-webfarm 82.136.1.111:80
option forwardfor
option httpclose
#let a stable 1.3 instance handle the real balancing etc
server realhost 82.136.1.111:8011 check inter 20000 rise 2 fall 3
contimeout 6000
clitimeout 2000
errorfile 503 /usr/local/etc/503error.html
### (d)dos protection ###
stick-table type ip size 1m expire 10m store gpc0,conn_rate(10s)
acl source_is_abuser src_get_gpc0 gt 0
tcp-request connection track-sc1 src if ! source_is_abuser
acl conn_rate_abuse sc1_conn_rate gt 40
acl mark_as_abuser sc1_inc_gpc0 gt 0
use_backend ease-up if source_is_abuser
use_backend ease-up if conn_rate_abuse mark_as_abuser
backend ease-up
mode http
errorfile 503 /usr/local/etc/503error_dos.html
And yeah, died with the same socks error message as yesterday.
(Server was hit by 30-40reqs/sec during this time, it died after ~30mins)
Hope it helps..let me know if you need any more input.
Thanks,
Joe
Idézet ("Jozsef R.Nagy" <[email protected]>):
On 2010. 09. 15. 15:08, Willy Tarreau wrote:
On Wed, Sep 15, 2010 at 01:00:57PM +0200, Jozsef R.Nagy wrote:
Have you found a minimal way to reproduce this ? Also did you have the
tcp-request rules enabled in the conf causing this issue ?
No minimal way yet, the config is the 'full' one i've set over
previously with 2 listens (and no frontend/backend blocks) with the mods
you've recommended:
OK, that's already useful information.
Is it? :)
So yea tcp-request rules were enabled.
Not sure how to reproduce it for getting to minimal way, as it only
happened 4 times on production setup, and can't really afford having it
dead a few more times atm :/
I certainly can understand and thank you for these tests. Now we're
certain there's a nasty bug, so you should stay on the safe side.
On test instance I can't get to reproducing it just yet..prolly not
enough traffic or concurrency simply?
that's very possible.
Willy
3 hours, 40k randomized requests later test instance -with same config-
still stands.
Difference between test and live:
- Test is only hit by 2ips, thus the rate limiter tables are way much smaller
- The concurrency ratio is still lower (over 100 on production every
now and then)
Otherwise running test instance on same host, same binary, very same
config (except ports).
Hopefully this helps a bit to narrow down the possible causes..
Let me know if I can help in any way to track this down, in need of
rate limiting.
Thanks,
Joe