On 12/01/2015 04:28 PM, Alexander Monakov wrote:
I'm taking a different approach.  I want to execute all insns in all warp
members, while ensuring that effect (on global and local state) is that same
as if any single thread was executing that instruction.  Most instructions
automatically satisfy that: if threads have the same state, then executing an
arithmetic instruction, normal memory load/store, etc. keep local state the
same in all threads.

The two exception insn categories are atomics and calls.  For calls, we can
demand recursively that they uphold this execution model, until we reach
runtime-provided "syscalls": malloc/free/vprintf.  Those we can handle like
atomics.

Didn't we also conclude that address-taking (let's say for stack addresses) is also an operation that does not result in the same state?

Have you tried to use the mechanism used for OpenACC? IMO that would be a good first step - get things working with fewer changes, and then look into optimizing them (ideally for OpenMP and OpenACC both).


Bernd

Reply via email to