This fixes the few code quality regressions from the previous series enabling SIMD32 CS codegen in the back-end -- AFAICT by the end of the series we can finally enable GL 4.3 on all Gen8+ hardware.
Patches 1-8 delay the SIMD lowering pass after the bulk of optimization passes have been run, which should decrease the compilation time of mainly SIMD32 shaders and improve the code quality of SIMD32 shaders on all generations and shaders of any dispatch width on older generations (up to and including IVB) that use SIMD lowering more intensively to implement various workarounds. Patches 9-14 rework the SIMD lowering pass to avoid emitting the copy instructions used to zip and unzip register regions where possible, since the register coalesce and copy propagation passes seem to perform rather poorly at getting rid of them in some cases. In the long term we'll likely want to improve the register coalesce pass irrespective of these changes. Patches 15-20 improve the compute-to-mrf pass used on Gen4-6 to handle cases where the source of a VGRF-to-MRF copy is initialized by the shader using multiple single-GRF writes, which becomes far more common with the additional SIMD lowering going on after this series. Patches 21-24 are some other assorted changes improving code quality on older gens. I wanted to provide more detailed (e.g. per commit) shader-db stats with this series, but kind of ran out of time. Let me know if you would like to see more evidence that any of the changes below is improving code quality in case it's not clear from the commit alone. [PATCH 01/25] i965/fs: Let CSE handle logical sampler sends as expressions. [PATCH 02/25] i965/fs: Allow constant propagation into logical send sources. [PATCH 03/25] i965/fs: Add FS_OPCODE_FB_WRITE_LOGICAL to has_side_effects(). [PATCH 04/25] i965/fs: Run SIMD and logical send lowering after the optimization loop. [PATCH 05/25] i965/fs: Take opt_redundant_discard_jumps out of the optimization loop. [PATCH 06/25] i965/fs: Fix UB list sentinel dereference in opt_sampler_eot(). [PATCH 07/25] i965/fs: Implement opt_sampler_eot() in terms of logical sends. [PATCH 08/25] SQUASH: i965/fs: Add basic dataflow check to opt_sampler_eot(). [PATCH 09/25] i965/fs: Refactor offset() into a separate function taking the width as argument. [PATCH 10/25] i965/fs: Generalize regions_overlap() from copy propagation to handle non-VGRF files. [PATCH 11/25] i965/fs: Factor out region zipping and unzipping from the SIMD lowering pass. [PATCH 12/25] i965/fs: Skip SIMD lowering source unzipping for regular scalar regions. [PATCH 13/25] i965/fs: Skip SIMD lowering destination zipping if possible. [PATCH 14/25] i965/fs: Reindent emit_zip(). [PATCH 15/25] i965/fs: Teach regions_overlap() about COMPR4 MRF regions. [PATCH 16/25] i965/fs: Simplify and improve accuracy of compute_to_mrf() by using regions_overlap(). [PATCH 17/25] i965/fs: Fix compute-to-mrf VGRF region coverage condition. [PATCH 18/25] i965/fs: Refactor compute_to_mrf() to split search and rewrite into separate loops. [PATCH 19/25] i965/fs: Teach compute_to_mrf about the COMPR4 address transformation. [PATCH 20/25] i965/fs: Extend compute_to_mrf() to coalesce VGRFs initialized by multiple single-GRF writes. [PATCH 21/25] i965/fs: Extend remove_duplicate_mrf_writes() to handle non-VGRF to MRF copies. [PATCH 22/25] i965/fs: Fix constant combining for instructions that cannot accept source mods. [PATCH 23/25] i965/fs: Allow scalar source regions on SNB math instructions. [PATCH 24/25] i965/fs: Skip gen4 pre/post-send dependency workaronds for the first/last block. [PATCH 25/25] i965: Expose GL 4.3 on Gen8+. _______________________________________________ mesa-dev mailing list mesa-dev@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/mesa-dev