On Tue, Jul 4, 2017 at 11:39 AM, Willy Tarreau <w...@1wt.eu> wrote: > > But what is wrong with stopping the loop as soon as the distance gets > larger than the stack_guard_gap ?
Absolutely nothing. But that's not the problem with the loop. Let's say that you are using lots of threads, so that you know your stack space is limited. What you do is to use MAP_FIXED a lot, and you lay out your stacks fairly densely (with each other, but also possibly with other mappings), with that PROT_NONE redzoning mapping in between the "dense" allocations. So when the kernel wants to grow the stack, it finds the PROT_NONE redzone mapping - but there's possibly other maps right under it, so the stack_guard_gap still hits other mappings. And the fact that this seems to trigger with (a) 32-bit x86 (b) Java actually makes sense in the above scenario: that's _exactly_ when you'd have dense mappings. Java is very thread-happy, and in a 32-bit VM, the virtual address space allocation for stacks is a primary issue with lots of threads. Of course, the downside to this theory is that apparently the Java problem is not confirmed to actually be due to this (Ben root-caused the rust thing on ppc64), but it still sounds like quite a reasonable thing to do. The problem with the Java issue may be that they do that "dense stack mappings in VM space" (for all the usual "lots of threads, limited VM" reasons), but they may *not* have that PROT_NONE redzoning at all. So the patch under discussion works for Rust exactly *because* it does its redzone to show "this is where I expect the stack to end". The i386 java load may simply not have that marker for us to use.. Linus