On Thu, Nov 12, 2015 at 04:58:21PM +0300, Alexander Monakov wrote:
> I'm proposing the following patch as a step towards resolving the issue with
> inaccessibility of stack storage (.local memory) in PTX to other threads than
> the one using that stack.  The idea is to have preallocated stacks, and have
> __nvptx_stacks[] array in shared memory hold current stack pointers.  Each
> thread is maintaining __nvptx_stacks[tid.y] as its stack pointer, thus for
> OpenMP the intent is to preallocate on a per-warp basis (not per-thread).
> For OpenMP SIMD regions we'll have to ensure that conflicting accesses are not
> introduced.
> 
> I've exposed a new command-line option -msoft-stack to ease testing, but for
> OpenMP we'll have to automatically flip it based on function attributes.
> Right now it's not easy because OpenMP and OpenACC both use "omp declare
> target".  Jakub, I seem to recall a discussion about OpenACC changing to use a
> separate attribute, but I cannot find it now.  Any advice here?

I believe OpenACC has acc routine {gang,worker,seq} that would roughly match
whether certain OpenMP declare target function (or ompfn region) is/can be
called within the target/teams/distribute context, or parallel context, or
simd context.  For OpenMP we have no such pragmas, so we need some analysis
to help the PTX (and, as Martin said on IRC, HSA apparently too) and add
attributes accordingly.
For the .ompfn* outlined region it is easy, there we know from which
construct it is, for other functions bet we want to do some IPA analysis for
this, start with the .ompfn* functions marked and walk the cgraph and for
declare target functions not callable from outside try to determine if they
are only called from parallel contexts, or not.

Does your patch affect all the stack allocations within certain function
(i.e. no way to select on a per-variable bases what stack to allocate it
to)?  Without any detailed analysis at least e.g. spilled (non-addressable)
vars could at least go to the local stack.  But PTX doesn't have any spills,
right?  Not sure about HSA.  If it is a per-function thing only, then it
isn't worth to do more detailed analysis at the ompexp time.

BTW, surely it will be an advantage if PTX can support alloca through this,
it could e.g. turn on -msoft-stack for all functions that use alloca/VLAs
automatically.

        Jakub

Reply via email to