Hey folks,

We've discovered an issue on Ubuntu/Lenny with libc6 2.11.1-0ubuntu7.5 (it
may also affect versions between 2.11.1-0ubuntu7.1 and 2.11.1-0ubuntu7.4).
The bug affects systems when a large number of threads (or processes) are
created rapidly. Once triggered, the system will become completely
unresponsive for ten to fifteen minutes. We've seen this issue on our
production Cassandra clusters under high load. Cassandra seems particularly
susceptible to this issue because of the large thread pools that it creates.
In particular, we suspect the unbounded thread pool for connection
management may be pushing some systems over the edge.

We're still trying to narrow down what changed in libc that is causing this
issue. We also haven't tested things outside of xen, or on non-x86
architectures. But if you're seeing these symptoms, you may want to try
upgrading libc6.

I'll send out an update if we find anything else interesting. If anyone has
any thoughts as to what the cause is, we're all ears!

Hope this saves someone some heart-ache,

Mike

Reply via email to