On Sun, May 05, 2019 at 07:07:21AM +0200, Willy Tarreau wrote:
> Thus I conclude that it crashed, and that all other threads just met at
> the same lock while the core was being dumped in this one.

Or maybe the tree got corrupted and __eb_insert_dup() entered an endless
loop. If that's the case (I mean if it froze and didn't crash), I may
have something to make this safer soon. I more or less managed to create
a watchdog timer to detect lockups and abort the whole process with a
trace when this happens. This will avoid keeping a faulty process in
prod and may even allow a quicker restart. I don't intend to backport
it to 1.9 though but depending on how effective and helpful it is, I
could change my mind. In all cases I don't want to use such solutions
to hide the dust under the carpet but instead to take detailed traces
without requiring human intervention when this happens.

Willy

Reply via email to