On 2018-03-07 00:47:12, Kenneth Graunke wrote: > On Wednesday, March 7, 2018 12:16:28 AM PST Jordan Justen wrote: > > Suggested-by: Kenneth Graunke <kenn...@whitecape.org> > > Signed-off-by: Jordan Justen <jordan.l.jus...@intel.com> > > --- > > src/intel/dev/gen_device_info.c | 13 ++++++++----- > > src/intel/dev/gen_device_info.h | 8 ++++++++ > > 2 files changed, 16 insertions(+), 5 deletions(-) > > > > diff --git a/src/intel/dev/gen_device_info.c > > b/src/intel/dev/gen_device_info.c > > index 1773009d33c..de018c6a692 100644 > > --- a/src/intel/dev/gen_device_info.c > > +++ b/src/intel/dev/gen_device_info.c > > @@ -284,11 +284,13 @@ static const struct gen_device_info > > gen_device_info_byt = { > > }, > > }; > > > > -#define HSW_FEATURES \ > > - GEN7_FEATURES, \ > > - .is_haswell = true, \ > > - .supports_simd16_3src = true, \ > > - .has_resource_streamer = true > > +#define HSW_FEATURES \ > > + GEN7_FEATURES, \ > > + .is_haswell = true, \ > > + .supports_simd16_3src = true, \ > > + .has_resource_streamer = true, \ > > + /* WaCSScratchSize:hsw */ \ > > + .cs_scratch_ids_per_subslice = 16 * 8 > > > > static const struct gen_device_info gen_device_info_hsw_gt1 = { > > HSW_FEATURES, .gt = 1, > > @@ -476,6 +478,7 @@ static const struct gen_device_info gen_device_info_chv > > = { > > .max_gs_threads = 80, > > .max_wm_threads = 128, > > .max_cs_threads = 6 * 7, > > + .cs_scratch_ids_per_subslice = 8 * 7, > > .urb = { > > .size = 192, > > .min_entries = { > > diff --git a/src/intel/dev/gen_device_info.h > > b/src/intel/dev/gen_device_info.h > > index b8044d00032..fa6b38f19ec 100644 > > --- a/src/intel/dev/gen_device_info.h > > +++ b/src/intel/dev/gen_device_info.h > > @@ -148,6 +148,14 @@ struct gen_device_info > > */ > > unsigned max_cs_threads; > > > > + /** > > + * The range of scratch addresses may differ from the number of EUs > > + * available for compute programs. This requires allocating a larger > > + * scratch buffer. For affected hardware, this will be set. If this is > > not > > + * set, then max_cs_threads should be used instead. > > + */ > > + unsigned cs_scratch_ids_per_subslice; > > + > > struct { > > /** > > * Hardware default URB size. > > > > This works, but it's not quite what I had in mind. I was thinking you > would add the new field, then add this hunk to gen_get_device_info(): > > if (devinfo->is_haswell) { > /* WaCSScratchSize:hsw > * > * Haswell's scratch space address calculation appears to be sparse > * rather than tightly packed. The Thread ID has bits indicating > * which subslice, EU within a subslice, and thread within an EU it > * is. There's a maximum of two slices and two subslices, so these > * can be stored with a single bit. Even though there are only 10 EUs > * per subslice, this is stored in 4 bits, so there's an effective > * maximum value of 16 EUs. Similarly, although there are only 7 > * threads per EU, this is stored in a 3 bit number, giving an > * effective maximum value of 8 threads per EU. > * > * This means that we need to use 16 * 8 instead of 10 * 7 for the > * number of threads per subslice. > */ > devinfo->cs_scratch_ids_per_subslice = 16 * 8; > } else if (devinfo->is_cherryview) { > /* For Cherryview, it appears that the scratch addresses for the 6 EU > * devices may still generate compute scratch addresses covering the > * same range as 8 EU. > */ > devinfo->cs_scratch_ids_per_subslice = 8 * 7; > } else { > devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads; > }
Originally, I had initialized them in the structures, and then had: if (devinfo->cs_scratch_ids_per_subslice == 0) devinfo->cs_scratch_ids_per_subslice = devinfo->max_cs_threads; I used it directly, like you mentioned below. Then I got concerned about how the kernel params might update max_cs_threads, and should that cause cs_scratch_ids_per_subslice to be updated? So, I changed it to be treated as an override. But, I could change the code that updates max_cs_threads to also update cs_scratch_ids_per_subslice. > This way, the field is always meaningful, and so in patches 4-5 you can > just use devinfo->cs_scratch_ids_per_subslice directly rather than doing > > const unsigned scratch_ids_per_subslice = > devinfo->cs_scratch_ids_per_subslice ? > devinfo->cs_scratch_ids_per_subslice : devinfo->max_cs_threads; > > This also has the side effect of preserving the comment, which I think > is valuable. Well, I guess I thought that the comment might be a bit overwrought. :) I mean, it's good to know that there's a special circumstance, but maybe it's not too bad if you have to find the full details in the commit history? :) Nevertheless, I'll keep it. -Jordan > There's already some precedent for this, as > gen_get_device_info() already whacks max_wm_threads precisely to deal > with scratch IDs being somewhat sparse. > > --Ken _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev