On Thu, 2017-09-28 at 00:59 -0700, Kenneth Graunke wrote: > On Thursday, September 28, 2017 12:36:27 AM PDT Iago Toral Quiroga > wrote: > > Triggering the push model when 64-bit inputs are involved is not > > easy due to > > the constrains on the maximum number of registers that we allow for > > this mode, > > however, for GS with 'points' primitive type and just a couple of > > double > > varyings we can trigger this and it just doesn't work because the > > implementation is not 64-bit aware at all. For now, let's make sure > > that we > > don't attempt this model whith 64-bit inputs and we always fall > > back to pull > > model for them. > > > > Also, don't enable the VUE handles in the thread payload on the fly > > when we > > find an input for which we need the pull model, this is not safe: > > if we need > > to resort to the pull model we need to account for that when we > > setup the > > thread payload so we compute the first non-payload register > > properly. If we > > didn't do that correctly and we enable it on-the-fly here then we > > will end up > > VUE handles on the first non-payload register which will probably > > lead to > > GPU hangs. Instead, always enable the VUE handles for the pull > > model so we > > can safely use them when needed. The GS is going to resort to pull > > model > > almost in every situation anyway, so this shouldn't make a > > significant > > difference and it makes things easier and safer. > > > > v2: Always enable the VUE handles for pull model, this is easier > > and safer > > and the GS is going to fallback to pull model almost always > > anyway (Ken) > > --- > > src/intel/compiler/brw_fs.cpp | 30 +++++++++++++++++-------- > > ----- > > src/intel/compiler/brw_fs_nir.cpp | 4 +++- > > 2 files changed, 20 insertions(+), 14 deletions(-) > > > > diff --git a/src/intel/compiler/brw_fs.cpp > > b/src/intel/compiler/brw_fs.cpp > > index eb9b4c38909..1517e87158b 100644 > > --- a/src/intel/compiler/brw_fs.cpp > > +++ b/src/intel/compiler/brw_fs.cpp > > @@ -5602,25 +5602,29 @@ fs_visitor::setup_gs_payload() > > payload.num_regs++; > > } > > > > + /* Always enable VUE handles so we can safely use pull model if > > needed. > > + * > > + * The push model for a GS uses a ton of register space even > > for trivial > > + * scenarios with just a few inputs, so just make things easier > > and a bit > > + * safer by always having pull model available. > > + */ > > + gs_prog_data->base.include_vue_handles = true; > > + > > + /* R3..RN: ICP Handles for each incoming vertex (when using > > pull model) */ > > + payload.num_regs += nir->info.gs.vertices_in; > > + > > /* Use a maximum of 24 registers for push-model inputs. */ > > const unsigned max_push_components = 24; > > > > - /* If pushing our inputs would take too many registers, reduce > > the URB read > > - * length (which is in HWords, or 8 registers), and resort to > > pulling. > > + /* We want to use a maximum of 24 registers for push-model > > inputs, so make > > + * sure our URB read length (which is in HWords, or 8 > > registers) respects > > + * that. > > * > > * Note that the GS reads <URB Read Length> HWords for every > > vertex - so we > > - * have to multiply by VerticesIn to obtain the total storage > > requirement. > > + * have to account for by VerticesIn to obtain the total > > storage requirement. > > */ > > - if (8 * vue_prog_data->urb_read_length * nir- > > >info.gs.vertices_in > > > - max_push_components || gs_prog_data->invocations > 1) { > > - gs_prog_data->base.include_vue_handles = true; > > - > > - /* R3..RN: ICP Handles for each incoming vertex (when using > > pull model) */ > > - payload.num_regs += nir->info.gs.vertices_in; > > - > > - vue_prog_data->urb_read_length = > > - ROUND_DOWN_TO(max_push_components / nir- > > >info.gs.vertices_in, 8) / 8; > > - } > > + vue_prog_data->urb_read_length = > > + ROUND_DOWN_TO(max_push_components / nir- > > >info.gs.vertices_in, 8) / 8; > > } > > Hmm, won't this set urb_read_length to the maximum all the time? > > This code was meant to clamp an existing value to the max, if it > exceeds that, but leave it less than that if that's all we needed...
Right, for some reason I convinced myself that we could just always use the maximum, but you're right that doesn't make sense if we have less than that, I don't know what I was thinking... sorry, I'll send a v3. Iago _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev