On 9/18/20 1:25 PM, Andrew Stubbs wrote:
> This patch fixes a problem in which nested OpenMP parallel regions cause
> errors if the number of inner teams is not balanced (i.e. the number of
> loop iterations is not divisible by the number of physical threads). A
> testcase is included.
> 
> On NVPTX the symptom was a fatal error:
> 
> libgomp: cuCtxSynchronize error: an illegal instruction was encountered
> 
> This was caused by mismatched "bar.sync" instructions (one waiting for
> 32 threads while another is waiting for 256). The source of the mismatch
> being that some threads were still busy while others had run out of work
> to do.
> 
> On GCN there was no such error (GCN barriers always wait for all
> threads), but it worked only by chance: the idle threads were "matching"
> different barriers to the busy threads, but it was harmless because the
> thread function pointer remained NULL.
> 
> This patch simply skips barriers when they would "wait" for only one
> thread (the current thread). This means that teams nested inside other
> teams now run independently, instead of strictly in lock-step, and is
> only valid as long as inner teams are limited to one thread each
> (currently the case).

Is this inner-team-one-thread-limit coded or documented somewhere?

If so, it might be good to add a comment there referring to the code
this patch adds.

Follow-up patch is OK, thanks.
- Tom

> When the inner regions exit then the barriers for
> the outer region will sync everything up again.
> 
> OK to commit?
> 
> Andrew
> 
> P.S. I can approve the amdgcn portion myself; I'm seeking approval for
> the nvptx portion.

Reply via email to