On Thu, 21 May 2015 13:57:00 +0200 Jakub Jelinek <ja...@redhat.com> wrote:
> On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote: > > This uses the patch I committed yesterday which introduces warp > > broadcasts to implement the vector-single predication needed for > > OpenACC. Outside a loop with vector parallelism, only one of the > > threads representing a vector must execute, the others follow > > along. So we skip the real work in each basic block for the > > inactive threads, then broadcast the direction to take in the > > control flow graph from the active one, and jump as a group. > > > > This will get extended with similar functionality for > > worker-single. Julian is working on some patches on top of that to > > ensure the later optimizers don't destroy the control flow - we > > really need the threads to reconverge and perform the > > broadcast/jump in lockstep. > > > > Committed on gomp-4_0-branch. > > What do you do with function calls? > Do you call them just in the (tid.x & 31) == 0 threads (then they > can't use vectorization), or for all threads (then it is an ABI > change, they would need to know whether they are called this way and > depending on that handle it similarly (skip all the real work, except > for function calls, for (tid.x & 31) != 0, unless it is a vectorized > region). Or is OpenACC restricting this to statements in the > constructs directly (rather than anywhere in the region)? OpenACC handles function calls specially (calling them "routines" -- of varying sorts, gang, worker, vector or seq, affecting where they can be invoked from). The plan is that all threads will call such routines -- and then some threads will be "neutered" as appropriate within the routines themselves, as appropriate. That's not actually implemented yet, though. Julian