Re: [gomp4] Vector-single predication

Julian Brown Thu, 21 May 2015 06:06:43 -0700

On Thu, 21 May 2015 13:57:00 +0200
Jakub Jelinek <ja...@redhat.com> wrote:


> On Thu, May 21, 2015 at 01:42:11PM +0200, Bernd Schmidt wrote:
> > This uses the patch I committed yesterday which introduces warp
> > broadcasts to implement the vector-single predication needed for
> > OpenACC. Outside a loop with vector parallelism, only one of the
> > threads representing a vector must execute, the others follow
> > along. So we skip the real work in each basic block for the
> > inactive threads, then broadcast the direction to take in the
> > control flow graph from the active one, and jump as a group.
> > 
> > This will get extended with similar functionality for
> > worker-single. Julian is working on some patches on top of that to
> > ensure the later optimizers don't destroy the control flow - we
> > really need the threads to reconverge and perform the
> > broadcast/jump in lockstep.
> > 
> > Committed on gomp-4_0-branch.
> 
> What do you do with function calls?
> Do you call them just in the (tid.x & 31) == 0 threads (then they
> can't use vectorization), or for all threads (then it is an ABI
> change, they would need to know whether they are called this way and
> depending on that handle it similarly (skip all the real work, except
> for function calls, for (tid.x & 31) != 0, unless it is a vectorized
> region). Or is OpenACC restricting this to statements in the
> constructs directly (rather than anywhere in the region)?

OpenACC handles function calls specially (calling them "routines" -- of
varying sorts, gang, worker, vector or seq, affecting where they can be
invoked from). The plan is that all threads will call such routines --
and then some threads will be "neutered" as appropriate within the
routines themselves, as appropriate.

That's not actually implemented yet, though.

Julian

Re: [gomp4] Vector-single predication

Reply via email to