Hi Robin,

sorry for the delay, we've been quite busy these last days :-/

On Mon, Aug 09, 2021 at 09:06:36PM +0000, Robin H. Johnson wrote:
> After months searching, at work we stumbled onto an internally usable-only
> reproduction case using a tool we wrote that made millions of requests: 
> Turning
> it up around ~6K RPS w/ lots of the headers being processed by our Lua code
> triggered the issue, running on a single-sock EPYC 7702P system.

OK!

> We also found a surprising mitigation: enabling multithreaded Lua w/
> "lua-load-per-thread" made the problem go away entirely (and gave a modest 10%
> performance boost, we are mostly limited by backend servers, not HAProxy or
> Lua).

Then I'm sorry that it was not spotted earlier, because this was a known
limitation of Lua in the pre-2.4 versions: the Lua code runs on a single
threaded stack, and by default as there is a single stack, when you have
too many threads, some are waiting ages to try to get access to the CPU,
to the point of possibly spinning more than 2 seconds there (which is an
eternity for a CPU).

BTW maybe we should arrange to take the Lua lock inside an externally
visible function that could be resolved. It would more easily show up in
case of trouble so that the issue becomes more obvious.

And that's exactly why lua-load-per-thread was introduced. It creates one
independent stack per thread so that there is no more locking. I suspect
that limiting the number of Lua instructions executed in a call could
have reduced the probability to keep Lua on a thread for too long, but
that equates to playing Russian roulette, and if you could switch to
lua-load-per-thread it's way better. I had started some work a few months
ago to implement latency-bounded locks that avoid the trouble of NUMA
systems (and even any system with a non-totally-uniform L3 cache) where
some groups of threads can hinder other groups for a while. When I'm done
with this I guess the Lua lock will be a good candidate for it!

> The Lua script was described in the previous script, and only does complex
> string parsing, used for variables, and driving some applets. It doesn't do 
> any
> blocking operations, sockets, files or rely on globals. It got a few cleanups
> for multi-threaded usage (forcing more variables to be explicitly local), but
> has no other significant changes relevant to this discussion (it had some
> business logic changes to string handling used to compute stick table keys, 
> but
> not really functionality changes).

I'm really glad that you managed to make it thread-safe with limited
changes, as that was our hope when we designed it like this with Thierry!

> The full errors are attached along with decoded core dump, with some details
> redacted per $work security team requirements.
> Repeated the error twice and both attempts are attached, 4 files in total.
> I'll repeat the short form here for interest from just one of the occurrences:

Many thanks for sharing all this, it will certainly help others.
(...)

Thanks!
Willy

Reply via email to