https://issues.dlang.org/show_bug.cgi?id=17914
Nemanja Boric <[email protected]> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |[email protected] --- Comment #7 from Nemanja Boric <[email protected]> --- (In reply to briancschott from comment #5) > > The problem is, this either break toy code that just > > creates lots of useless Fibers or heavy production code > > that actually needs 30K Fibers. > > That's not entirely accurate. I found this because the range implementation > for a complex multi-dimensional tree structure was implemented as a > std.concurrency.Generator. The problem is caused by doing many tree > traversals in a loop on a machine that has 512GB of memory. We don't > actually need 30k fibers. We need one fiber at a time 30k times in a row. By > the time we start using the second fiber we don't need the first one anymore. > > It's possible to re-implement the range in question, but the amount of code > required to do so is much greater than the current implementation using > recursion and fibers. Is it not possible just to use a simple pool for the Fibers? So, instead of allocating a new fiber, you reset the recycled one to the new state. Is this not possible with the std.concurrency.Generator? > If we really need a comprise here, I'd definitely increase the stack size > further, e.g. to 64KiB (vibe.d's default). This could be a problem on 32bit systems, where this would suck available address space very quickly, if I'm not mistaken. Of course, we could still make it a default for 64bit systems. > A possible idea is putting a pattern in the guard data (leave it read/write), > and check the guard data on fiber deallocation, and optionally on every yield > (could be a performance issue, but maybe only used in debug mode?) One problem I can see with this is that it doesn't show the problem on the spot - good thing about protected region is that we die as soon as we try accessing it. If the fiber does a lot of calls before yields, it may be that we're far from the place that actually corrupted memory, and it's not that easy to debug. Of course, it still prevents memory corruption (but at a cost that grows linearly with the number of fibers). > I'd err on the side of memory safety. Stack overflows are an actual problem > with Fibers and hard&time-consuming to debug. Yes. I can't emphasize enough how many times we had these issues. And it's never been a problem for us to confirm that this is what's happening with the guard page in. It was always a hell to confirm what's going on with the previous runtime, mostly because of the fact that it's not the fiber that's actually corrupting memory the one it will crash, but some other poor fiber that happens to have a stack adjacent to the rouge one. Having said all this - why not just keep the default guard page on, but if somebody is confident they don't need it (and they could even do a `version(debug){}else{guardPageOff}`), simply providing 0 for the guard size in the Fiber's constructor will turn this entire thing off? Perhaps we could document this better (as I didn't expect this issue to raise when implementing this, not much documentation is there). However, I also thing Steven has a valid point when he says: > Except this breaks existing code that may not have a stack overflow problem. > I hate to say it, but unfortunately, we are stuck with the status quo for now. --
