https://gcc.gnu.org/bugzilla/show_bug.cgi?id=104893

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |WORKSFORME
             Status|UNCONFIRMED                 |RESOLVED

--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #1)
> (In reply to Tom de Vries from comment #0)
> > The per-thread call stack is handled for .local memory by the CUDA driver.
> > 
> > For the 'soft stack' that's not the case.
> 
> Hmm, actually there's .local memory used, just not "directly".  Possibly the
> documentation needs updating to point that out.
> 
> So, there doesn't seem to be an issue related to overlapping storage.
> 
> So I wonder, is the stack pointer also per thread then? Or still per-warp?

OK, here ( https://gcc.gnu.org/bugzilla/show_bug.cgi?id=97203#c6 ) we read:
...
The pointer is switched between per-warp global memory and per-lane local
memory.
...

So, I think this should be fine then.

Marking this resolved-worksforme until we run into an actual failing test-case.

Reply via email to