URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=2e75d71c1faa737ef3290ff1e9cb4851762fa381 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Wed Nov 15 10:48:02 2023 -0800
intel/cmat: Generate better code for nir_intrinsic_cmat_insert When the source destination index is a constant, we can avoid generating a lot of the intermediate code. At the very least, this makes initial NIR dumps much easier to read. v2: Simplify tracking of dst_index. Suggested by Caio. Suggested-by: Caio Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=c6d44284aa633569a58200d00015b3e6d80a465a Author: Ian Romanick <ian.d.roman...@intel.com> Date: Wed Aug 2 13:36:33 2023 -0700 intel/dev: Enable VK_KHR_cooperative_matrix on all Gfx9+ GPUs Gfx12.5 (DG2) will use DPAS instructions to accelerate the implementation. Earlier platforms will use equivalent discrete instructions (basically subgroup operations). Gfx12 (Tigerlake) will use DP4A for 8-bit integer matrix multiplication. Older platforms, which lack DP4A, will use a suboptimal instruction sequence. There is plenty of room for improvement here. On DG2 (Gfx12.5) gets the following results from the CTS: Test run totals: Passed: 1642/13982 (11.7%) Failed: 0/13982 (0.0%) Not supported: 12340/13982 (88.3%) Warnings: 0/13982 (0.0%) Waived: 0/13982 (0.0%) On DG2 (Gfx12.5) with forced lowering, Raptor Lake (Gfx12) and Ice Lake (Gfx11): Test run totals: Passed: 1662/13982 (11.9%) Failed: 0/13982 (0.0%) Not supported: 12320/13982 (88.1%) Warnings: 0/13982 (0.0%) Waived: 0/13982 (0.0%) The difference in the number of tests run is due to saturatingAccumulation not being set on DG2 when DPAS is used. There is a comment in "intel/dev: Advertise integer configs with saturatingAccumulation too" that explains how this could be added should the need arise. v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=8ea032b78ee3257fd9398db8b79cdf9ca5ff4a36 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Fri Oct 20 18:24:25 2023 -0700 intel/dev: Advertise integer configs with saturatingAccumulation too VUID-RuntimeSpirv-saturatingAccumulation-08983 says: For OpCooperativeMatrixMulAddKHR, the SaturatingAccumulation cooperative matrix operand must be present if and only if VkCooperativeMatrixPropertiesKHR::saturatingAccumulation is VK_TRUE. As a result, we have to advertise integer configs both with and without this flag set. v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=f952dd510e4e83639f77259baaa61ff25c918305 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Tue Aug 1 10:38:14 2023 -0700 anv: Select the SIMD mode very early when cooperative matrices are used The commit is a little ugly. The definition of anv_fixup_subgroup_size is moved before the added call site. In addition, the bit starting at the "Cooperative matrix extension requires..." comment is added. v2: Dramatic simplification of SIMD selection. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=511f91e307c98326185ec69570b0c6eee2c36cab Author: Ian Romanick <ian.d.roman...@intel.com> Date: Tue Aug 8 09:32:40 2023 -0700 anv: Lower indirect derefs again after lowering cooperative matrices The cooperative matrix lowering can generate a lot of indirect array accesses, and these need to be eliminated. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=b741a9a851ca3747aa92ce0d6611b488c6e0e07b Author: Ian Romanick <ian.d.roman...@intel.com> Date: Mon Sep 25 09:16:55 2023 -0700 anv: Set PIPELINE_SELECT systolic mode enable flag Set the flag on compute shaders when the application has enabled the cooperative matrix feature. We might still want to enable this only when DPAS is actually used. The current method is based on many suggestions from Lionel. Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=7bfbeb79a75a04c3a7baa0e230a5bd4efa0976c4 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Fri Sep 22 16:17:18 2023 -0700 anv: Set COMPUTE_WALKER systolic mode enable flag Reviewed-by: Lionel Landwerlin <lionel.g.landwer...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=67739b02de08e97128673f05bf1a525047873d3e Author: Ian Romanick <ian.d.roman...@intel.com> Date: Mon Oct 30 11:06:24 2023 -0700 anv: Add anv_physical_device::has_cooperative_matrix This flag tracks whether or not cooperative matrices are fully enabled on the physica device (i.e., both the configs exist and the environment varible is set). This is mainly to support a later commit "anv: Set PIPELINE_SELECT systolic mode enable flag." This could be squashed into "anv: Implement VK_KHR_cooperative_matrix." I left it separate because we might go back to the previous method. v3: Don't hide the extension behind an environment variable (ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting PIPELINE_SELECT. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=0a6f8b40bfdf39faaf1ff7def741faf612cf5706 Author: Caio Oliveira <caio.olive...@intel.com> Date: Tue Jun 13 19:48:16 2023 -0700 anv: Implement VK_KHR_cooperative_matrix v2: Rebase on moving lowering pass to src/intel/compiler. v3: Don't hide the extension behind an environment variable (ANV_COOPERATIVE_MATRIX) now the we have a better solution for setting PIPELINE_SELECT. v4: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Also rebase on f99e43d606e ("anv: switch to use runtime physical device properties infrastructure"). Reviewed-by: Ian Romanick <ian.d.roman...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=ff16458478eec50b04190f58802dde5d4d3e99d7 Author: Caio Oliveira <caio.olive...@intel.com> Date: Fri Jun 16 16:47:45 2023 -0700 intel/dev: Add cooperative matrix configuration information v2: Prefix type names with INTEL_CMAT_. Suggested by Lionel. Reviewed-by: Ian Romanick <ian.d.roman...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=6b14da33ad3aa8a30ed5e479eace8bc6470095a7 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Mon Oct 9 13:54:38 2023 -0700 intel/fs: nir: Add nir_intrinsic_dpas_intel v2: Fix parameter order in nir_intrinsic_dpas_intel to DPAS conversion. v3: Fix float16 destination DPAS on DG2. v4: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. v5: Rebase on !26323. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=3756f605586fb2dcf53d892606152ecc5ce1ad1d Author: Ian Romanick <ian.d.roman...@intel.com> Date: Tue Oct 10 15:35:46 2023 -0700 intel/fs: DPAS lowering Implements integer dot product lowering both with and without DP4A. Implements half-float dot product lowering. There are a couple FINISHME comments describing future optimizations. v2: Add a brw_compiler::lower_dpas flag to track when the lowering should be applied. v3: Use is_null() instead of checking file != ARF. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=3cb96255397747ecef3f824064ca0afba349c50d Author: Ian Romanick <ian.d.roman...@intel.com> Date: Mon Oct 16 14:22:51 2023 -0700 intel/fs: Fix scoreboarding for DPAS v2: Remove all mention of DPASW. Suggested by Curro and Caio. Reviewed-by: Francisco Jerez <curroje...@riseup.net> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=eb1f19d7bf194574b984033754a301d1407f24d5 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Mon Sep 25 17:40:01 2023 -0700 intel/compiler: Validation for DPAS instructions v2: s/regiser/register/g in messages. Noticed by Caio. Add more context to the sub-byte precision error message. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=1c92dad5cb7f5d46dfaf56d2f9ce0203c2fbefbe Author: Ian Romanick <ian.d.roman...@intel.com> Date: Mon Oct 9 16:31:41 2023 -0700 intel/disasm: Disassembly support for DPAS v2: Fix regioning in src[012]_dpas_3src. Noticed by Caio. Treat DPAS as unordered. Suggested by Curro. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=e666872c751bedd1e4c2e1231644c14ed18639e7 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Wed Sep 20 12:42:24 2023 -0700 intel/compiler: Initial bits for DPAS instruction v2: Add brw_ir_performance.cpp and brw_fs_generator.cpp changes. Fix overlapping register allocation (via has_source_and_destination_hazard). Fix incorrect destination register file encoding. v3: Prevent lower_regioning from trying to "fix" DPAS sources. v4: Add instruction latency information for scheduling and perf estimates. v5: Remove all mention of DPASW. Suggested by Curro and Caio. Update the comment in fs_inst::has_source_and_destination_hazard. Suggested by Caio. v6: Add some comments near the src2 calculation in fs_inst::size_read. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=3a35f8b29bb9b6a92f98e8bb897bd444a54ca255 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Tue Oct 3 11:25:36 2023 -0700 intel/cmat: Lower cmat_load and cmat_store v2: Add support for non-constant stride. v3: Explain B matrices (a little bit) in get_slice_type_from_desc. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=502be565da052e91adfa596945d5d55f7565a203 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Fri Jul 21 16:06:48 2023 -0700 intel/cmat: Add lowering for cmat_bitcast v2: Use nir_component_mask(...) instead of 0xffff. Assert that source and destination are same size. Both suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=7303315a8b5d16dc269359e19a8edcee4af99823 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Fri Jul 14 11:34:44 2023 -0700 intel/cmat: Enable packed formats for scalar ops v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_scalar handling. This saved 13 lines of code. v3: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=26c4acd8ee58239dadb0dcaf59703c7510ebbb9a Author: Ian Romanick <ian.d.roman...@intel.com> Date: Thu Jul 13 11:08:54 2023 -0700 intel/cmat: Enable packed formats for binary ops v2: Use nir_pack_bits and nir_unpack_bits to simplify coop_binary handling. This saved 13 lines of code. v3: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=0d314eb3ccdbbc9c050c9432ee3713da5a9853c7 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Thu Jul 13 11:05:16 2023 -0700 intel/cmat: Enable packed formats for unary, length, and construct With this, a minimum test case passes: void main() { coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matA; coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA> matR; matA = coopmat<float16_t, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(2.0); matR = coopmat<float, gl_ScopeSubgroup, M, N, gl_MatrixUseA>(matA); coopMatStore(matR, result, 0, N, gl_CooperativeMatrixLayoutRowMajor); } v2: Use nir_vec instead of explicit nir_vec{2,4}. Also fixes a typo in one of the 4x8 cases. v3: Use nir_pack_bits and nir_unpack_bits to dramatically simplify coop_unary handling. This saved 67 lines of code. v4: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. v5: Massive update to the comment in lower_cooperative_matrix_unary_op with some suggestions from Caio. Add a comment and assertion around `nir_def *v[4]`. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=75388a71c932db7114a6980ef818b6f50236d6f9 Author: Ian Romanick <ian.d.roman...@intel.com> Date: Thu Jun 29 18:21:44 2023 -0700 intel/cmat: Add lowering for cmat_insert and cmat_extract v2: Use nir_component_mask(...) instead of 0xffff. Suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=a2ded5b26cbaa7ee5f433f046b5f2c559329740e Author: Ian Romanick <ian.d.roman...@intel.com> Date: Wed Jul 12 17:50:17 2023 -0700 intel/cmat: Update get_slice_type for packed slices Also splits off another funciton get_slice_type_from_desc that will be used in future commits. v2: Allow packing factor 2 and packing factor 1 elements be stored in 16-bit integers. v3: Use glsl_base_type_get_bit_size. v4: Adjust packing so that a single row fills an entire GRF. v5: Add comment for get_packing_factor and some other cleanups there. s/cooperative_matrix/cmat/. Tighten the validation of len in gt_slice_from_desc. All suggested by Caio. Reviewed-by: Caio Oliveira <caio.olive...@intel.com> Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994> URL: http://cgit.freedesktop.org/mesa/mesa/commit/?id=dba6451ce8113b7f81df95897d666d37ae5b8cee Author: Caio Oliveira <caio.olive...@intel.com> Date: Tue Jun 13 19:45:49 2023 -0700 intel/cmat: Add pass to lower cooperative matrix to subgroup operations This is just the skeleton of the implementation. Future commits will fill it all in. v2: Move to src/intel/compiler v3 (idr): Use vecN instead of array[N] for slice type. v4 (idr): Refactor lower_cooperative_matrix_load and lower_cooperative_matrix_store into a single function. v5 (idr): Remove old, verbose debug logging. Assert that entry is not NULL in get_coop_type_for_slice. Use nir_component_mask(...) instead of 0xffff. s/cooperative_matrix/cmat/. All suggested by Caio. Reviewed-by: Ian Romanick <ian.d.roman...@intel.com> Reviewed-by: Caio Oliveira <caio.olive...@intel.com> I put both R-b on this because, at this point, we've each done equal parts authoring and reviewing. Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/25994>