On Wed, 24 Jan 2018, Tom de Vries wrote: > Hi, > > this patch adds a workaround for the nvptx target JIT bug PR83589 - "[nvptx] > mode-transitions.c and private-variables.{c,f90} execution FAILs at > GOMP_NVPTX_JIT=-O0". > > > When compiling a branch-around-nothing (where the branch is warp neutering, so > it's a divergent branch): > ... > .reg .pred %r36; > { > .reg .u32 %x; > mov.u32 %x,%tid.x; > setp.ne.u32 %r36,%x,0; > } > > @ %r36 bra $L5; > $L5: > ... > > The JIT fails to generate a convergence point here: > ... > /*0128*/ @P0 BRA `(.L_1); > .L_1: > ... > > Consequently, we execute subsequent code in divergent mode, and when executing > a shfl.idx a bit later we run into the undefined behaviour that shfl.idx has > when executing in divergent mode. > > The workaround detects branch-around-nothing, and inserts a ptx operation that > does nothing (I'm calling it a fake nop, I haven't been able to come up with a > better term yet): > ... > @ %r36 bra $L5; > { > .reg .u32 %nop_src; > .reg .u32 %nop_dst; > mov.u32 %nop_dst, %nop_src; > } > $L5: > ... > which makes the test pass, because then we generate a convergence point here > at .L1: > ... > /*0128*/ SSY `(.L_1); > /*0130*/ @P0 SYNC (*"TARGET= .L_1 "*); > /*0138*/ SYNC (*"TARGET= .L_1 "*); > .L_1: > ... > > The workaround is not minimal given that it inserts the fake nop in all > branch-around-nothings it detects, not just the warp neutering ones, but I > think this is more robust than trying to identify the warp neutering branches. > Furthermore, I'm not going for optimality here anyway. The optimal way to fix > this is making sure we don't generate branch-around-nothing, but that's for > stage1. > > Build and reg-tested on x86_64 with nvptx accelerator. > > I'd like to commit in stage4, but I'd appreciate a review of the code. Does > the patch look OK?
Ok for stage4, but this isn't a review ;) Richard. > Thanks, > - Tom > -- Richard Biener <rguent...@suse.de> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nuernberg)