On Thursday, June 9, 2016 10:50:53 AM PDT Kenneth Graunke wrote:
> On Thursday, June 9, 2016 10:00:40 AM PDT Ilia Mirkin wrote:
> > On Jun 9, 2016 4:10 AM, "Kenneth Graunke" <[email protected]> wrote:
> > >
> > > Skylake changes the representation of shared local memory size:
> > >
> > >  Size   | 0 kB | 1 kB | 2 kB | 4 kB | 8 kB | 16 kB | 32 kB | 64 kB |
> > >  -------------------------------------------------------------------
> > >  Gen7-8 |    0 | none | none |    1 |    2 |     3 |     4 |     5 |
> > >  -------------------------------------------------------------------
> > >  Gen9+  |    0 |    1 |    2 |    3 |    4 |     5 |     6 |     7 |
> > >
> > > The old formula would substantially underallocate the amount of space.
> > > This fixes GPU hangs on Skylake when running with full thread counts.
> > >
> > > Cc: "12.0" <[email protected]>
> > > Signed-off-by: Kenneth Graunke <[email protected]>
> > > ---
> > >  src/mesa/drivers/dri/i965/gen7_cs_state.c | 15 ++++++++++-----
> > >  1 file changed, 10 insertions(+), 5 deletions(-)
> > >
> > > diff --git a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > index 750aa2c..aff1f4e 100644
> > > --- a/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > +++ b/src/mesa/drivers/dri/i965/gen7_cs_state.c
> > > @@ -150,11 +150,16 @@ brw_upload_cs_state(struct brw_context *brw)
> > >     assert(prog_data->total_shared <= 64 * 1024);
> > >     uint32_t slm_size = 0;
> > >     if (prog_data->total_shared > 0) {
> > > -      /* slm_size is in 4k increments, but must be a power of 2. */
> > > -      slm_size = 4 * 1024;
> > > -      while (slm_size < prog_data->total_shared)
> > > -         slm_size <<= 1;
> > > -      slm_size /= 4 * 1024;
> > > +      /* Shared Local Memory Size is specified as powers of two. */
> > > +      slm_size = util_next_power_of_two(prog_data->total_shared);
> > > +
> > > +      if (brw->gen >= 9) {
> > > +         /* Use a minimum of 1kB; turn an exponent of 10 (1024 kB) into
> > 1. */
> > > +         slm_size = ffs(MAX2(slm_size, 1024)) - 10;
> > > +      } else {
> > > +         /* Use a minimum of 4kB; convert to the pre-Gen9
> > representation. */
> > > +         slm_size = MAX2(slm_size, 4096) / 4096;
> > 
> > According to your chart, 16k should end up with 3, but this logic will
> > produce 4. The old comment said it was in increments of 4k, so I'm guessing
> > just the chart needs to be adjusted.
> 
> Yikes, sorry!  A wrong chart is better than no chart at all.  I meant:

                                  worse.  :(  wow.

>   Size   | 0 kB | 1 kB | 2 kB | 4 kB | 8 kB | 16 kB | 32 kB | 64 kB |
>   -------------------------------------------------------------------
>   Gen7-8 |    0 | none | none |    1 |    2 |     4 |     8 |    16 |
>   -------------------------------------------------------------------
>   Gen9+  |    0 |    1 |    2 |    3 |    4 |     5 |     6 |     7 |
> 
> I should probably move this code to a helper function and put the
> (correct) table in a comment...
> 

Attachment: signature.asc
Description: This is a digitally signed message part.

_______________________________________________
mesa-dev mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/mesa-dev

Reply via email to