Hi Javier,
thanks for your help.
Am 20.05.2012 13:58, schrieb Javier Miguel Rodríguez:
I know that you are NOT running RHEL / CentOS, but this problem with
1000 child processes bit us hard, read this red hat kernel bugzilla
(Timo has comments inside):
https://bugzilla.redhat.com/show_bug.cgi?id=681578
Maybe you are
hitting the same limit?
yes maybe.
The only strange thing is that I don't see any erros in my dovecot logs.
I don't see erros like "Panic: epoll_ctl" ore something else.
I checked my kernel and the patch mentioned in
https://bugzilla.redhat.com/show_bug.cgi?id=681578
(comment 31) is not applied. It comes in version 3.0.30 and 3.2.17.
I will see what tomorrow happens under more load.
If I have the problem again, I give 3.2.17 a chance.
thanks
Urban
Regards
Javier
El 20/05/2012 11:59, Urban
Loesch escribió:
Am 19.05.2012 21:05, schrieb Timo Sirainen:
On Wed, 2012-05-16 at 08:59 +0200, Urban Loesch wrote:
The
Server was running about 1 year without any problems. 15Min Load was
between 0,5 and max 8. No high IOWAIT. CPU Idletime about 98%.
..
# iostat -k Linux 3.0.28-vs2.3.2.3-rol-em64t (mailstore4)
16.05.2012 _x86_64_ (24 CPU)
Did you change the kernel just before it
broke? I'd try another version.
The first time it brokes with
kernel 2.6.38.8-vs2.3.0.37-rc17.
Then I tried it with 3.0.28 and it
brokes again.
On friday evening I disabled the cgroup feature
compleetly and until now
it seems to work normally.
But this could
be because we have weekend and now there are not many
connections
active. So I have
to wait until monday. If it happens again I will try
version 3.2.17.
On the other side it could be that the server is
overloaded, because
this problem happens only when there are
more
than 1000 tasks active. Sounds strange for me, because it has been
working without problems since 1 year
and we made no changes. Also
there were almost more than 1000 tasks
active over the last year and
we had no problems.
thanks
Urban