On Fri, Nov 20, 2009 at 01:55:37PM -0800, Jose Avila(Tachu) wrote:
> Thanks Willy, this usually happens when i send restart signals to a process 
> while the older process has not finished. 
> 
> ej.   
> 
> proces 1000 is currently active so i send restart to haproxy with -sf 1000 
> new process is 1001
> the killing of process 1000 takes say 30 secs in that same 30 seconds my auto 
> scale adds or removes another server and sends restarts haproxy with -sh 1001

OK so in fact you have multiple processes at a given instant when
this happens.

> im not sure which of the both gets in D state but it only happens when im 
> doing scaling of more than 1 host at the time. the server is currently 
> processing about 200k requests per minute so it takes a bit to restart an 
> instance.  

It should not be long anyway, because the new process can bind to the
post as soon as the old one has released it, which is almost instant.
The fact that there are still established connections is irrelevant in
this case. What can be long is the time the old process remains alive.
It will stay here until the last session completes. But it will not
bother the new process.

Oh I'm thinking about something. Check your free RAM when the problem
happens. It's very possible that having multiple concurrent processes 
makes your system swap, which would exactly cause a D state. This is
the reason I build with dlmalloc, because it is able to release unused
memory since it uses mmap().

> I've changed my script to only add or remove 1 server at the time see if that 
> helps. 
> 
> On another note, I've been looking for a concise guide on what kernel 
> parameters i can tweak to improve performance. I gotta say im impressed 
> already on how well it handles traffic. but i would like to perhaps try to 
> squeeze a bit more. out of it on peaks each one of my load balancer is 
> balancing about 100 backend servers and processing an average of 3k- 5k  
> requests per second. 

Check the list archives, there have already been some posts on the
subject. The principle is always the same : don't use conntrack on
the system, increase somaxconn and tcp_max_syn_backlog, enlarge the
source ports range, set tcp_tw_reuse to 1 and reduce the default
tcp_rmem/tcp_wmem values. Once that's done, you can observe and
finely tune even deeper for your specific usage.

Regards,
Willy


Reply via email to