On Wed, 2018-03-28 at 14:55 -0700, Jordan Justen wrote: > On 2018-03-26 08:23:13, Juan A. Suarez Romero wrote: > > On Wed, 2018-03-07 at 00:16 -0800, Jordan Justen wrote: > > > Ken suggested that we might be underallocating scratch space on > > > HD > > > 400. Allocating scratch space as though there was actually 8 EUs > > > seems to help with a GPU hang seen on synmark CSDof. > > > > > > > FYI, in order to pick this commit for next 17.3 stable release, I > > need to pick > > also: > > > > commit f9d5a7add42af5a2e4410526d1480a08f41317ae > > Author: Jordan Justen <jordan.l.jus...@intel.com> > > Date: Tue Oct 31 00:34:32 2017 -0700 > > > > i965: Calculate thread_count in brw_alloc_stage_scratch > > I believe that this commit lead to a regression with compute shaders, > which was fixed by: > > commit a16dc04ad51c32e5c7d136e4dd6273d983385d3f > Author: Kenneth Graunke <kenn...@whitecape.org> > Date: Tue Oct 31 00:56:24 2017 -0700 > > i965: properly initialize brw->cs.base.stage to > MESA_SHADER_COMPUTE > > You should probably add Ken's a16dc04ad51c before f9d5a7add42a. >
Thanks a lot! Fortunately, a16dc04ad51c was already nominated and included in 17.3.0. So it is in the stable branch. J.A. > -Jordan > > > > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=104636 > > > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=105290 > > > Cc: Kenneth Graunke <kenn...@whitecape.org> > > > Cc: Eero Tamminen <eero.t.tammi...@intel.com> > > > Cc: <mesa-sta...@lists.freedesktop.org> > > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > > > --- > > > src/mesa/drivers/dri/i965/brw_program.c | 44 > > > ++++++++++++++++++++------------- > > > 1 file changed, 27 insertions(+), 17 deletions(-) > > > > > > diff --git a/src/mesa/drivers/dri/i965/brw_program.c > > > b/src/mesa/drivers/dri/i965/brw_program.c > > > index 527f003977b..c121136c439 100644 > > > --- a/src/mesa/drivers/dri/i965/brw_program.c > > > +++ b/src/mesa/drivers/dri/i965/brw_program.c > > > @@ -402,23 +402,33 @@ brw_alloc_stage_scratch(struct brw_context > > > *brw, > > > if (devinfo->gen >= 9) > > > subslices = 4 * brw->screen->devinfo.num_slices; > > > > > > - /* WaCSScratchSize:hsw > > > - * > > > - * Haswell's scratch space address calculation appears to > > > be sparse > > > - * rather than tightly packed. The Thread ID has bits > > > indicating > > > - * which subslice, EU within a subslice, and thread within > > > an EU > > > - * it is. There's a maximum of two slices and two > > > subslices, so these > > > - * can be stored with a single bit. Even though there are > > > only 10 EUs > > > - * per subslice, this is stored in 4 bits, so there's an > > > effective > > > - * maximum value of 16 EUs. Similarly, although there are > > > only 7 > > > - * threads per EU, this is stored in a 3 bit number, > > > giving an effective > > > - * maximum value of 8 threads per EU. > > > - * > > > - * This means that we need to use 16 * 8 instead of 10 * 7 > > > for the > > > - * number of threads per subslice. > > > - */ > > > - const unsigned scratch_ids_per_subslice = > > > - devinfo->is_haswell ? 16 * 8 : devinfo->max_cs_threads; > > > + unsigned scratch_ids_per_subslice; > > > + if (devinfo->is_haswell) { > > > + /* WaCSScratchSize:hsw > > > + * > > > + * Haswell's scratch space address calculation appears > > > to be sparse > > > + * rather than tightly packed. The Thread ID has bits > > > indicating > > > + * which subslice, EU within a subslice, and thread > > > within an EU it > > > + * is. There's a maximum of two slices and two > > > subslices, so these > > > + * can be stored with a single bit. Even though there > > > are only 10 EUs > > > + * per subslice, this is stored in 4 bits, so there's > > > an effective > > > + * maximum value of 16 EUs. Similarly, although there > > > are only 7 > > > + * threads per EU, this is stored in a 3 bit number, > > > giving an > > > + * effective maximum value of 8 threads per EU. > > > + * > > > + * This means that we need to use 16 * 8 instead of 10 > > > * 7 for the > > > + * number of threads per subslice. > > > + */ > > > + scratch_ids_per_subslice = 16 * 8; > > > + } else if (devinfo->is_cherryview) { > > > + /* For Cherryview, it appears that the scratch > > > addresses for the 6 EU > > > + * devices may still generate compute scratch addresses > > > covering the > > > + * same range as 8 EU. > > > + */ > > > + scratch_ids_per_subslice = 8 * 7; > > > + } else { > > > + scratch_ids_per_subslice = devinfo->max_cs_threads; > > > + } > > > > > > thread_count = scratch_ids_per_subslice * subslices; > > > break; > > _______________________________________________ > mesa-stable mailing list > mesa-sta...@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-stable _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev