On Sun, May 05, 2019 at 07:07:21AM +0200, Willy Tarreau wrote: > Thus I conclude that it crashed, and that all other threads just met at > the same lock while the core was being dumped in this one.
Or maybe the tree got corrupted and __eb_insert_dup() entered an endless loop. If that's the case (I mean if it froze and didn't crash), I may have something to make this safer soon. I more or less managed to create a watchdog timer to detect lockups and abort the whole process with a trace when this happens. This will avoid keeping a faulty process in prod and may even allow a quicker restart. I don't intend to backport it to 1.9 though but depending on how effective and helpful it is, I could change my mind. In all cases I don't want to use such solutions to hide the dust under the carpet but instead to take detailed traces without requiring human intervention when this happens. Willy