Samuel Iglesias Gonsálvez <sigles...@igalia.com> writes: > From: Iago Toral Quiroga <ito...@igalia.com> > > There will be a few places where we need to shuffle the result of a 32-bit > load into valid 64-bit data, so extract this logic into a separate helper > that we can reuse. > > The shuffling needs to operate with WE_all set because we are changing the > layout of the data across the various channels. Otherwise we will run into > problems in non-uniform control-flow scenarios. > I guess you could remove this paragraph because it no longer applies to the last version of the patch.
> v2 (Curro): > - Use subscript() instead of stride() > - Assert on the input types rather than retyping. > - Use offset() instead of horiz_offset(), drop the multiplier definition. > - Do not use a temporary for the writes and drop force_writemask_all. Don't pretend you took my "don't use a temporary" suggestion into account. :P > - Mark component_i as const. Did you forget to git-add something? > - Make the function name lower case. > --- > src/mesa/drivers/dri/i965/brw_fs.cpp | 53 > ++++++++++++++++++++++++++++++++++++ > src/mesa/drivers/dri/i965/brw_fs.h | 5 ++++ > 2 files changed, 58 insertions(+) > > diff --git a/src/mesa/drivers/dri/i965/brw_fs.cpp > b/src/mesa/drivers/dri/i965/brw_fs.cpp > index 15d5759..4827dea 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.cpp > +++ b/src/mesa/drivers/dri/i965/brw_fs.cpp > @@ -212,6 +212,59 @@ fs_visitor::VARYING_PULL_CONSTANT_LOAD(const fs_builder > &bld, > } > > /** > + * This helper takes the result of a load operation that reads 32-bit > elements > + * in this format: > + * > + * x x x x x x x x > + * y y y y y y y y > + * z z z z z z z z > + * w w w w w w w w > + * > + * and shuffles the data to get this: > + * > + * x y x y x y x y > + * x y x y x y x y > + * z w z w z w z w > + * z w z w z w z w > + * > + * Which is exactly what we want if the load is reading 64-bit components > + * like doubles, where x represents the low 32-bit of the x double component > + * and y represents the high 32-bit of the x double component (likewise with > + * z and w for double component y). The parameter @components represents > + * the number of 64-bit components present in @src. This would typically be > + * 2 at most, since we can only fit 2 double elements in the result of a > + * vec4 load. > + * > + * Notice that @dst and @src can be the same register. > + */ > +void > +fs_visitor::shuffle_32bit_load_result_to_64bit_data(const fs_builder &bld, There was a second reason I had in mind when I suggested it would improve encapsulation to take this out of fs_visitor: The function has absolutely nothing to do with visiting, it uses no internal or external fs_visitor data structures or interfaces, it doesn't even use the "this" pointer. Defining a function that could perfectly be stand-alone inside an object it has no need to be in (in this case it doesn't even have any logical relation with) actually *decreases* encapsulation because it exposes the object's internals to the function unnecessarily. Either way the back-end code is already plagued by this anti-pattern so I'm not going to complain if you keep the code as-is -- You could argue you're just being consistent with the existing practice. ;) > + const fs_reg dst, > + const fs_reg src, Pass by reference. > + uint32_t components) > +{ > + assert(type_sz(src.type) == 4); > + assert(type_sz(dst.type) == 8); > + > + /* A temporary that we will use to shuffle the 32-bit data of each > + * component in the vector into valid 64-bit data. We can't write directly > + * to dst because dst can be (and would usually be) the same as src > + * and in that case the first MOV in the loop below would overwrite the > + * data read in the second MOV. > + */ > + fs_reg tmp = bld.vgrf(dst.type); > + > + for (unsigned i = 0; i < components; i++) { > + fs_reg component_i = offset(src, bld, 2 * i); > + > + bld.MOV(subscript(tmp, src.type, 0), component_i); > + bld.MOV(subscript(tmp, src.type, 1), offset(component_i, bld, 1)); > + > + bld.MOV(offset(dst, bld, i), tmp); Ah, yes, this looks much better, Reviewed-by: Francisco Jerez <curroje...@riseup.net> > + } > +} > + > +/** > * A helper for MOV generation for fixing up broken hardware SEND dependency > * handling. > */ > diff --git a/src/mesa/drivers/dri/i965/brw_fs.h > b/src/mesa/drivers/dri/i965/brw_fs.h > index b906f3d..1970ad0 100644 > --- a/src/mesa/drivers/dri/i965/brw_fs.h > +++ b/src/mesa/drivers/dri/i965/brw_fs.h > @@ -105,6 +105,11 @@ public: > uint32_t const_offset); > void DEP_RESOLVE_MOV(const brw::fs_builder &bld, int grf); > > + void shuffle_32bit_load_result_to_64bit_data(const brw::fs_builder &bld, > + const fs_reg dst, > + const fs_reg src, > + uint32_t components); > + > bool run_fs(bool do_rep_send); > bool run_vs(gl_clip_plane *clip_planes); > bool run_tcs_single_patch(); > -- > 2.5.0 > > _______________________________________________ > mesa-dev mailing list > mesa-dev@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/mesa-dev
signature.asc
Description: PGP signature
_______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev