[clang] [RISCV] Add riscv_simd.h for P extension intrinsics (PR #181115)

via cfe-commits Mon, 23 Feb 2026 23:50:26 -0800

sihuan wrote:

> Have you tested if this generates the expected instructions?


Yes, I have tested the codegen. For most cases, it generates the expected 
instructions perfectly.

However, there are two specific cases where the generated assembly is 
sub-optimal. I think these are backend lowering and optimization opportunities 
rather than issues with the frontend intrinsics themselves.

#### 1. RV32 handling 64-bit vectors (e.g., `int8x8_t`):
Instead of generating a register pair instruction like `padd.db`, the backend 
currently splits the operation into two 32-bit instructions.
```c
int8x8_t test_padd_i8x8(int8x8_t a, int8x8_t b) {
    return __riscv_padd_i8x8(a, b);
}
```
Compiled by `clang -cc1 -triple riscv32 -target-feature +experimental-p -mllvm 
-riscv-enable-p-ext-simd-codegen -O2 -S`, yields:
```assembly
test_padd_i8x8:
        padd.b    a0, a2, a0
        padd.b    a1, a3, a1
        ret
```
The frontend correctly emits the `<8 x i8>` addition in IR, but it seems the 
backend currently lacks the patterns to lower this into register pair 
instructions. This limitation is also documented in the existing backend test: 
https://github.com/llvm/llvm-project/blob/0b8bb80e27c6051794873a16a0eaf63501a6a1c7/llvm/test/CodeGen/RISCV/calling-conv-p-ext-vector.ll#L27-L40

#### 2. RV64 handling 32-bit vectors (e.g., `int8x4_t`):
The assembly includes redundant shift instructions.
```c
int8x4_t test_padd_i8x4(int8x4_t a, int8x4_t b) {
    return __riscv_padd_i8x4(a, b);
}
```
Compiled by `clang -cc1 -triple riscv64 -target-feature +experimental-p -mllvm 
-riscv-enable-p-ext-simd-codegen -O2 -S`, yields:
```assembly
test_padd_i8x4:
        padd.b    a0, a0, a1
        slli      a0, a0, 32
        srli      a0, a0, 32
        ret
```
These shift instructions appear because the frontend uses integer coercion 
(`zext i32 ... to i64`) to return the 32-bit aggregate in a 64-bit register 
according to the ABI. The backend faithfully executes the zero-extension, but 
misses the opportunity to optimize the shifts away.

Since the Clang frontend is emitting the correct vector arithmetic and ABI 
coercion IR, I think it might be better to address these codegen improvements 
in subsequent backend patches. What are your thoughts on this?

https://github.com/llvm/llvm-project/pull/181115
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[clang] [RISCV] Add riscv_simd.h for P extension intrinsics (PR #181115)

Reply via email to