https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104893
Tom de Vries <vries at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |WORKSFORME Status|UNCONFIRMED |RESOLVED --- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> --- (In reply to Tom de Vries from comment #1) > (In reply to Tom de Vries from comment #0) > > The per-thread call stack is handled for .local memory by the CUDA driver. > > > > For the 'soft stack' that's not the case. > > Hmm, actually there's .local memory used, just not "directly". Possibly the > documentation needs updating to point that out. > > So, there doesn't seem to be an issue related to overlapping storage. > > So I wonder, is the stack pointer also per thread then? Or still per-warp? OK, here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203#c6 ) we read: ... The pointer is switched between per-warp global memory and per-lane local memory. ... So, I think this should be fine then. Marking this resolved-worksforme until we run into an actual failing test-case.