https://github.com/ritter-x2a created https://github.com/llvm/llvm-project/pull/126887
gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all documentation occurrences of gfx940/gfx941 except for the gfx940 ISA description, which will be the subject of a separate PR. For SWDEV-512631 >From 12460f083060a7e330975a7354bdc019f670f386 Mon Sep 17 00:00:00 2001 From: Fabian Ritter <fabian.rit...@amd.com> Date: Wed, 12 Feb 2025 05:45:01 -0500 Subject: [PATCH] [AMDGPU][docs] Replace gfx940 and gfx941 with gfx942 in llvm/docs gfx940 and gfx941 are no longer supported. This is one of a series of PRs to remove them from the code base. This PR removes all documentation occurrences of gfx940/gfx941 except for the gfx940 ISA description, which will be the subject of a separate PR. For SWDEV-512631 --- llvm/docs/AMDGPUOperandSyntax.rst | 4 +- llvm/docs/AMDGPUUsage.rst | 97 ++++++++++--------------------- 2 files changed, 34 insertions(+), 67 deletions(-) diff --git a/llvm/docs/AMDGPUOperandSyntax.rst b/llvm/docs/AMDGPUOperandSyntax.rst index ff6ec6cf71ff2..e8a76322fe76a 100644 --- a/llvm/docs/AMDGPUOperandSyntax.rst +++ b/llvm/docs/AMDGPUOperandSyntax.rst @@ -63,7 +63,7 @@ Note: *N* and *K* must satisfy the following conditions: * 0 <= *K* <= 255. * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32. -GFX90A and GFX940 have an additional alignment requirement: +GFX90A and GFX942 have an additional alignment requirement: pairs of *vector* registers must be even-aligned (first register must be even). @@ -183,7 +183,7 @@ Note: *N* and *K* must satisfy the following conditions: * 0 <= *K* <= 255. * *K-N+1* must be in the range from 1 to 12 or equal to 16 or 32. -GFX90A and GFX940 have an additional alignment requirement: +GFX90A and GFX942 have an additional alignment requirement: pairs of *accumulator* registers must be even-aligned (first register must be even). diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst index 83ec1eecb6e5e..14b3b6fce9e70 100644 --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -323,7 +323,7 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following Add product names. - **GCN GFX9 (Vega)** [AMD-GCN-GFX900-GFX904-VEGA]_ [AMD-GCN-GFX906-VEGA7NM]_ [AMD-GCN-GFX908-CDNA1]_ [AMD-GCN-GFX90A-CDNA2]_ [AMD-GCN-GFX940-GFX942-CDNA3]_ + **GCN GFX9 (Vega)** [AMD-GCN-GFX900-GFX904-VEGA]_ [AMD-GCN-GFX906-VEGA7NM]_ [AMD-GCN-GFX908-CDNA1]_ [AMD-GCN-GFX90A-CDNA2]_ [AMD-GCN-GFX942-CDNA3]_ ----------------------------------------------------------------------------------------------------------------------- ``gfx900`` ``amdgcn`` dGPU - xnack - Absolute - *rocm-amdhsa* - Radeon Vega flat - *pal-amdhsa* Frontier Edition @@ -378,20 +378,6 @@ Every processor supports every OS ABI (see :ref:`amdgpu-os`) with the following - Ryzen 3 Pro 4350G - Ryzen 3 Pro 4350GE - ``gfx940`` ``amdgcn`` dGPU - sramecc - Architected *TBA* - - tgsplit flat - - xnack scratch .. TODO:: - - kernarg preload - Packed - work-item Add product - IDs names. - - ``gfx941`` ``amdgcn`` dGPU - sramecc - Architected *TBA* - - tgsplit flat - - xnack scratch .. TODO:: - - kernarg preload - Packed - work-item Add product - IDs names. - ``gfx942`` ``amdgcn`` dGPU - sramecc - Architected - AMD Instinct MI300X - tgsplit flat - AMD Instinct MI300A - xnack scratch @@ -583,10 +569,10 @@ Generic processor code objects are versioned. See :ref:`amdgpu-generic-processor - ``v_dot2_f32_f16`` - ``gfx9-4-generic`` ``amdgcn`` - ``gfx940`` - sramecc - Architected FP8 and BF8 instructions, - - ``gfx941`` - tgsplit flat scratch FP8 and BF8 conversion - - ``gfx942`` - xnack - Packed instructions, as well as - - ``gfx950`` - kernarg preload work-item instructions with XF32 format + ``gfx9-4-generic`` ``amdgcn`` - ``gfx942`` - sramecc - Architected FP8 and BF8 instructions, + - ``gfx950`` - tgsplit flat scratch FP8 and BF8 conversion + - xnack - Packed instructions, as well as + - kernarg preload work-item instructions with XF32 format IDs support are not available. ``gfx10-1-generic`` ``amdgcn`` - ``gfx1010`` - xnack - Absolute flat - The following instructions are @@ -4974,7 +4960,7 @@ The fields used by CP for code objects before V3 also match those specified in bytes 383:352 4 bytes COMPUTE_PGM_RSRC3 GFX6-GFX9 Reserved, must be 0. - GFX90A, GFX940 + GFX90A, GFX942 Compute Shader (CS) program settings used by CP to set up @@ -5059,7 +5045,7 @@ The fields used by CP for code objects before V3 also match those specified in 463:460 4 bits Reserved, must be 0. 470:464 7 bits KERNARG_PRELOAD_SPEC_LENGTH GFX6-GFX9 - Reserved, must be 0. - GFX90A, GFX940 + GFX90A, GFX942 - The number of dwords from the kernarg segment to preload into User SGPRs before kernel @@ -5067,7 +5053,7 @@ The fields used by CP for code objects before V3 also match those specified in :ref:`amdgpu-amdhsa-kernarg-preload`). 479:471 9 bits KERNARG_PRELOAD_SPEC_OFFSET GFX6-GFX9 - Reserved, must be 0. - GFX90A, GFX940 + GFX90A, GFX942 - An offset in dwords into the kernarg segment to begin preloading data into User @@ -5093,7 +5079,7 @@ The fields used by CP for code objects before V3 also match those specified in GFX6-GFX9 - vgprs_used 0..256 - max(0, ceil(vgprs_used / 4) - 1) - GFX90A, GFX940 + GFX90A, GFX942 - vgprs_used 0..512 - vgprs_used = align(arch_vgprs, 4) + acc_vgprs @@ -5559,7 +5545,7 @@ The fields used by CP for code objects before V3 also match those specified in .. - .. table:: compute_pgm_rsrc3 for GFX90A, GFX940 + .. table:: compute_pgm_rsrc3 for GFX90A, GFX942 :name: amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table ======= ======= =============================== =========================================================================== @@ -9970,15 +9956,15 @@ only accessed by a single thread, and is always write-before-read, there is never a need to invalidate these entries from the L1 cache. Hence all cache invalidates are done as ``*_vol`` to only invalidate the volatile cache lines. -The code sequences used to implement the memory model for GFX940, GFX941, GFX942 -are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942-table`. +The code sequences used to implement the memory model for GFX942 are defined in +table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx942-table`. - .. table:: AMDHSA Memory Model Code Sequences GFX940, GFX941, GFX942 - :name: amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx941-gfx942-table + .. table:: AMDHSA Memory Model Code Sequences GFX942 + :name: amdgpu-amdhsa-memory-model-code-sequences-gfx942-table ============ ============ ============== ========== ================================ LLVM Instr LLVM Memory LLVM Memory AMDGPU AMDGPU Machine Code - Ordering Sync Scope Address GFX940, GFX941, GFX942 + Ordering Sync Scope Address GFX942 Space ============ ============ ============== ========== ================================ **Non-Atomic** @@ -10013,18 +9999,12 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 load *none* *none* - local 1. ds_load store *none* *none* - global - !volatile & !nontemporal - generic - - private 1. GFX940, GFX941 + - private 1. GFX942 - constant buffer/global/flat_store - sc0=1 sc1=1 - GFX942 - buffer/global/flat_store - !volatile & nontemporal - 1. GFX940, GFX941 - buffer/global/flat_store - nt=1 sc0=1 sc1=1 - GFX942 + 1. GFX942 buffer/global/flat_store nt=1 @@ -10696,11 +10676,8 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 **Release Atomic** ------------------------------------------------------------------------------------ - store atomic release - singlethread - global 1. GFX940, GFX941 + store atomic release - singlethread - global 1. GFX942 - wavefront - generic buffer/global/flat_store - sc0=1 sc1=1 - GFX942 - buffer/global/flat_store store atomic release - singlethread - local *If TgSplit execution mode, - wavefront local address space cannot @@ -10738,10 +10715,7 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 store that is being released. - 2. GFX940, GFX941 - buffer/global/flat_store - sc0=1 sc1=1 - GFX942 + 2. GFX942 buffer/global/flat_store sc0=1 store atomic release - workgroup - local *If TgSplit execution mode, @@ -10802,10 +10776,7 @@ are defined in table :ref:`amdgpu-amdhsa-memory-model-code-sequences-gfx940-gfx9 store that is being released. - 3. GFX940, GFX941 - buffer/global/flat_store - sc0=1 sc1=1 - GFX942 + 3. GFX942 buffer/global/flat_store sc1=1 store atomic release - system - global 1. buffer_wbl2 sc0=1 sc1=1 @@ -17563,11 +17534,7 @@ in this description. CDNA 2 :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>` :doc:`gfx90a<AMDGPU/AMDGPUAsmGFX90a>` - CDNA 3 :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>` :doc:`gfx940<AMDGPU/AMDGPUAsmGFX940>` - - :doc:`gfx941<AMDGPU/AMDGPUAsmGFX940>` - - :doc:`gfx942<AMDGPU/AMDGPUAsmGFX940>` + CDNA 3 :doc:`GFX9<AMDGPU/AMDGPUAsmGFX9>` :doc:`gfx942<AMDGPU/AMDGPUAsmGFX940>` RDNA 1 :doc:`GFX10 RDNA1<AMDGPU/AMDGPUAsmGFX10>` :doc:`gfx1010<AMDGPU/AMDGPUAsmGFX10>` @@ -17605,7 +17572,7 @@ combinations of operands, refer to one of instruction set architecture manuals [AMD-GCN-GFX6]_, [AMD-GCN-GFX7]_, [AMD-GCN-GFX8]_, [AMD-GCN-GFX900-GFX904-VEGA]_, [AMD-GCN-GFX906-VEGA7NM]_, [AMD-GCN-GFX908-CDNA1]_, [AMD-GCN-GFX90A-CDNA2]_, -[AMD-GCN-GFX940-GFX942-CDNA3]_, [AMD-GCN-GFX10-RDNA1]_, [AMD-GCN-GFX10-RDNA2]_, +[AMD-GCN-GFX942-CDNA3]_, [AMD-GCN-GFX10-RDNA1]_, [AMD-GCN-GFX10-RDNA2]_, [AMD-GCN-GFX11-RDNA3]_ and [AMD-GCN-GFX11-RDNA3.5]_. Operands @@ -18118,7 +18085,7 @@ terminated by an ``.end_amdhsa_kernel`` directive. :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table` ``.amdhsa_user_sgpr_private_segment_buffer`` 0 GFX6-GFX10 Controls ENABLE_SGPR_PRIVATE_SEGMENT_BUFFER in (except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - GFX940) + GFX942) ``.amdhsa_user_sgpr_dispatch_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_DISPATCH_PTR in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_queue_ptr`` 0 GFX6-GFX12 Controls ENABLE_SGPR_QUEUE_PTR in @@ -18129,7 +18096,7 @@ terminated by an ``.end_amdhsa_kernel`` directive. :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_flat_scratch_init`` 0 GFX6-GFX10 Controls ENABLE_SGPR_FLAT_SCRATCH_INIT in (except :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. - GFX940) + GFX942) ``.amdhsa_user_sgpr_private_segment_size`` 0 GFX6-GFX12 Controls ENABLE_SGPR_PRIVATE_SEGMENT_SIZE in :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_wavefront_size32`` Target GFX10-GFX12 Controls ENABLE_WAVEFRONT_SIZE32 in @@ -18140,8 +18107,8 @@ terminated by an ``.end_amdhsa_kernel`` directive. :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_system_sgpr_private_segment_wavefront_offset`` 0 GFX6-GFX10 Controls ENABLE_PRIVATE_SEGMENT in (except :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`. - GFX940) - ``.amdhsa_enable_private_segment`` 0 GFX940, Controls ENABLE_PRIVATE_SEGMENT in + GFX942) + ``.amdhsa_enable_private_segment`` 0 GFX942, Controls ENABLE_PRIVATE_SEGMENT in GFX11-GFX12 :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`. ``.amdhsa_system_sgpr_workgroup_id_x`` 1 GFX6-GFX12 Controls ENABLE_SGPR_WORKGROUP_ID_X in :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`. @@ -18162,14 +18129,14 @@ terminated by an ``.end_amdhsa_kernel`` directive. Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`. ``.amdhsa_accum_offset`` Required GFX90A, Offset of a first AccVGPR in the unified register file. - GFX940 Used to calculate ACCUM_OFFSET in + GFX942 Used to calculate ACCUM_OFFSET in :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`. ``.amdhsa_reserve_vcc`` 1 GFX6-GFX12 Whether the kernel may use the special VCC SGPR. Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`. ``.amdhsa_reserve_flat_scratch`` 1 GFX7-GFX10 Whether the kernel may use flat instructions to access (except scratch memory. Used to calculate - GFX940) GRANULATED_WAVEFRONT_SGPR_COUNT in + GFX942) GRANULATED_WAVEFRONT_SGPR_COUNT in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`. ``.amdhsa_reserve_xnack_mask`` Target GFX8-GFX10 Whether the kernel may trigger XNACK replay. Feature Used to calculate GRANULATED_WAVEFRONT_SGPR_COUNT in @@ -18200,7 +18167,7 @@ terminated by an ``.end_amdhsa_kernel`` directive. ``.amdhsa_fp16_overflow`` 0 GFX9-GFX12 Controls FP16_OVFL in :ref:`amdgpu-amdhsa-compute_pgm_rsrc1-gfx6-gfx12-table`. ``.amdhsa_tg_split`` Target GFX90A, Controls TG_SPLIT in - Feature GFX940, :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`. + Feature GFX942, :ref:`amdgpu-amdhsa-compute_pgm_rsrc3-gfx90a-table`. Specific GFX11-GFX12 (tgsplit) ``.amdhsa_workgroup_processor_mode`` Target GFX10-GFX12 Controls ENABLE_WGP_MODE in @@ -18228,9 +18195,9 @@ terminated by an ``.end_amdhsa_kernel`` directive. ``.amdhsa_exception_int_div_zero`` 0 GFX6-GFX12 Controls ENABLE_EXCEPTION_INT_DIVIDE_BY_ZERO in :ref:`amdgpu-amdhsa-compute_pgm_rsrc2-gfx6-gfx12-table`. ``.amdhsa_user_sgpr_kernarg_preload_length`` 0 GFX90A, Controls KERNARG_PRELOAD_SPEC_LENGTH in - GFX940 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. + GFX942 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ``.amdhsa_user_sgpr_kernarg_preload_offset`` 0 GFX90A, Controls KERNARG_PRELOAD_SPEC_OFFSET in - GFX940 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. + GFX942 :ref:`amdgpu-amdhsa-kernel-descriptor-v3-table`. ======================================================== =================== ============ =================== .amdgpu_metadata @@ -18400,7 +18367,7 @@ Additional Documentation .. [AMD-GCN-GFX906-VEGA7NM] `AMD Vega 7nm Instruction Set Architecture <https://gpuopen.com/wp-content/uploads/2019/11/Vega_7nm_Shader_ISA_26November2019.pdf>`__ .. [AMD-GCN-GFX908-CDNA1] `AMD Instinct MI100 Instruction Set Architecture <https://developer.amd.com/wp-content/resources/CDNA1_Shader_ISA_14December2020.pdf>`__ .. [AMD-GCN-GFX90A-CDNA2] `AMD Instinct MI200 Instruction Set Architecture <https://developer.amd.com/wp-content/resources/CDNA2_Shader_ISA_4February2022.pdf>`__ -.. [AMD-GCN-GFX940-GFX942-CDNA3] `AMD Instinct MI300 Instruction Set Architecture <https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf>`__ +.. [AMD-GCN-GFX942-CDNA3] `AMD Instinct MI300 Instruction Set Architecture <https://www.amd.com/content/dam/amd/en/documents/instinct-tech-docs/instruction-set-architectures/amd-instinct-mi300-cdna3-instruction-set-architecture.pdf>`__ .. [AMD-GCN-GFX10-RDNA1] `AMD RDNA 1.0 Instruction Set Architecture <https://gpuopen.com/wp-content/uploads/2019/08/RDNA_Shader_ISA_5August2019.pdf>`__ .. [AMD-GCN-GFX10-RDNA2] `AMD RDNA 2 Instruction Set Architecture <https://developer.amd.com/wp-content/resources/RDNA2_Shader_ISA_November2020.pdf>`__ .. [AMD-GCN-GFX11-RDNA3] `AMD RDNA 3 Instruction Set Architecture <https://developer.amd.com/wp-content/resources/RDNA3_Shader_ISA_December2022.pdf>`__ _______________________________________________ llvm-branch-commits mailing list llvm-branch-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-branch-commits