Hi,


> I hope you are not to angry that I ask a Linux network question here.
>
> The reason is that on this list are also very experienced users about
> high traffic
> and high performance setups.

Still offtopic, as it isn't a haproxy issue. If you think thats a kernel
issue, LKML is the right place to ask for support. But with the informations
you have I disagree with the conclusion that this is a kernel issue. Collect
more informations about your box and provide them on the nginx mailing list.

If your issue is business affecting and you are unable to fix it I suggest
you buy commercial support from nginx.



> Cite from
>
> http://mailman.nginx.org/pipermail/nginx/2014-February/042148.html
>
> ####
> currently we have a huge traffic come up.
>
> ~500 r/s
> http://download.none.at/nginx_request-day.png
>
> ~3.5K active connections
> http://download.none.at/port_www-day.png
> http://download.none.at/nginx_combined.png
>
> The Peaks are the raw values from module status.
>
> ~1.1g b/s traffic
> http://download.none.at/if_eth2-day.png
> http://download.none.at/tcp-day.png
>
> I have tried to setup the machine for this traffic but it looks to me
> that was not successfully.

While you did collect lots of informations about your traffic, you don't
seem to look at the actual bottleneck, which would be more important.

How is the CPU load on those 24 CPUs? Where it is spend? kernel/userspace/io?

Provide some vmstat 1 outputs, top and ps outputs.



> HW:
> 24 CPUs

Why are you using 6 workers when you have 24 CPUs?



> When I activate the aio, nginx and xfs crashes, that's why aio is not
> active.

Did you report this upstream? What do mean by "xfs crashes"? Are you
aware of the AIO limitations on linux, as mentioned in the nginx
documentation? Please don't use the nginx wiki, but the official nginx
documentation. The wiki is obsolete, partially wrong and unmaintained.

Anyway, I doubt that AIO/directio is a good idea on your box, as it bypasses
the pagecache.



> On this machine also runs a postgresql and php-fpm but the current
> traffic is from delivering of pictures from the file system.

You are serving pictures, are you using appropiate HTTP headers to
cache the pictures on the browser side to avoid unnecessary requests?



> I use netfilter with fail2ban, but not the connection tracking module!

Why not give it a try without it. Not sure if fail2ban scales infinitely.



> I use
>
> https://github.com/munin-monitoring/contrib/blob/master/plugins/nginx/nginx-combined
>
> to get the statistics from stub_status_module.
>
> The call from nginx-combined_<IP-ALIAS> runs on the same machine as the
> nginx server.
>
> Due to this fact we have no external network traffic, just an ip alias
> call on eth2.
>
> Every time when I have more then ~400 r/s we get no data from the
> status-request, this request rate means ~20k Packets/Second.
>
> [...]
>
> I have now seen on the tcpdump that I get a 'RST' Package quite
> immediately after a request when the 'no answer from server' cames.

Are you dropping anything in iptables? Check the counters.

Are you sure the workers don't hit the maximum of 4096 connections?



> Please can you help me to find the reason for the immediately 'RST'
> answer.

You can see from your dmesg that syncookies is hopping in. Is this at the same
moment when you see the TCP resets? Elaborate whether this is a DOS or normal
traffic. If the latter is the case, you probably need some SYN backlog tuning.


Don't expect an answer here on the haproxy mailing list. You would be better
of sending those informations to the nginx mailing list.



Regards,

Lukas                                     

Reply via email to