Hi Robin, sorry for the delay, we've been quite busy these last days :-/
On Mon, Aug 09, 2021 at 09:06:36PM +0000, Robin H. Johnson wrote: > After months searching, at work we stumbled onto an internally usable-only > reproduction case using a tool we wrote that made millions of requests: > Turning > it up around ~6K RPS w/ lots of the headers being processed by our Lua code > triggered the issue, running on a single-sock EPYC 7702P system. OK! > We also found a surprising mitigation: enabling multithreaded Lua w/ > "lua-load-per-thread" made the problem go away entirely (and gave a modest 10% > performance boost, we are mostly limited by backend servers, not HAProxy or > Lua). Then I'm sorry that it was not spotted earlier, because this was a known limitation of Lua in the pre-2.4 versions: the Lua code runs on a single threaded stack, and by default as there is a single stack, when you have too many threads, some are waiting ages to try to get access to the CPU, to the point of possibly spinning more than 2 seconds there (which is an eternity for a CPU). BTW maybe we should arrange to take the Lua lock inside an externally visible function that could be resolved. It would more easily show up in case of trouble so that the issue becomes more obvious. And that's exactly why lua-load-per-thread was introduced. It creates one independent stack per thread so that there is no more locking. I suspect that limiting the number of Lua instructions executed in a call could have reduced the probability to keep Lua on a thread for too long, but that equates to playing Russian roulette, and if you could switch to lua-load-per-thread it's way better. I had started some work a few months ago to implement latency-bounded locks that avoid the trouble of NUMA systems (and even any system with a non-totally-uniform L3 cache) where some groups of threads can hinder other groups for a while. When I'm done with this I guess the Lua lock will be a good candidate for it! > The Lua script was described in the previous script, and only does complex > string parsing, used for variables, and driving some applets. It doesn't do > any > blocking operations, sockets, files or rely on globals. It got a few cleanups > for multi-threaded usage (forcing more variables to be explicitly local), but > has no other significant changes relevant to this discussion (it had some > business logic changes to string handling used to compute stick table keys, > but > not really functionality changes). I'm really glad that you managed to make it thread-safe with limited changes, as that was our hope when we designed it like this with Thierry! > The full errors are attached along with decoded core dump, with some details > redacted per $work security team requirements. > Repeated the error twice and both attempts are attached, 4 files in total. > I'll repeat the short form here for interest from just one of the occurrences: Many thanks for sharing all this, it will certainly help others. (...) Thanks! Willy

