Hey folks, We've discovered an issue on Ubuntu/Lenny with libc6 2.11.1-0ubuntu7.5 (it may also affect versions between 2.11.1-0ubuntu7.1 and 2.11.1-0ubuntu7.4). The bug affects systems when a large number of threads (or processes) are created rapidly. Once triggered, the system will become completely unresponsive for ten to fifteen minutes. We've seen this issue on our production Cassandra clusters under high load. Cassandra seems particularly susceptible to this issue because of the large thread pools that it creates. In particular, we suspect the unbounded thread pool for connection management may be pushing some systems over the edge.
We're still trying to narrow down what changed in libc that is causing this issue. We also haven't tested things outside of xen, or on non-x86 architectures. But if you're seeing these symptoms, you may want to try upgrading libc6. I'll send out an update if we find anything else interesting. If anyone has any thoughts as to what the cause is, we're all ears! Hope this saves someone some heart-ache, Mike