Re: [PATCH] RISC-V: Add LMUL-aware RVV cost model for the Spacemit-X60 core

Jeffrey Law Wed, 04 Feb 2026 14:11:21 -0800



On 2/4/2026 9:21 AM, Nikola Ratkovac wrote:

This patch introduces a vector cost model for the Spacemit-X60 core.

The model is LMUL-aware, based on measurements showing that vector instruction
latency and throughput vary significantly with LMUL, so the cost
model distinguishes between m1/m2/m4/m8 cases.

To keep the machine description manageable, a new 'vector_lmul'
attribute is introduced to map RVV modes to their corresponding LMUL
values. The costs are based on llvm-mca performance simulations
and microbenchmarks, with additional stress tests used to validate
and adjust individual instruction types.

On selected numerical benchmarks this results in performance improvements
of ~3%, while instruction counts remain effectively unchanged (<0.1%).

| Benchmark        | Metric | Trunk            | Vector Cost Model | Δ (%)   |
|------------------|--------|------------------|-------------------|---------|
| SciMark2-C       | cycles | 311,538,498,801  | 300,670,104,666   | -3.49%  |
|------------------|--------|------------------|-------------------|---------|
| tramp3d-v4       | cycles | 23,673,618,009   | 22,916,964,182    | -3.20%  |
|------------------|--------|------------------|-------------------|---------|
| Freebench/neural | cycles | 471,768,472      | 454,850,594       | -3.59%  |
|------------------|--------|------------------|-------------------|---------|

Benchmarks were run from the LLVM test-suite
(MultiSource/Benchmarks) using:

taskset -c 0 perf stat -r 10 ./...

SciMark2-C, FreeBench/neural, and tramp3d-v4
were used as representative numerical workloads.

For tramp3d-v4, the workload parameters (--cartvis 1.0 0.0, --rhomin 1e-8,
-n 20) increase floating-point intensity and dependency pressure, placing
greater stress on the scheduler.

2026-02-04  Nikola Ratkovac  <[email protected]>

gcc/ChangeLog:

     * config/riscv/spacemit-x60.md: Add primary vector pipeline model
     for the Spacemit-X60 core.
     (vector_lmul): New attribute mapping machine modes to LMUL.
     (spacemit_x60_dummy): Rename from spacemi6_x60_dummy.

So at a high level, I would clamp all the reservations at 7c ofreservation -- beyond that the DFA blows up badly which willsignificantly harm build times for GCC itself. And in reality it'sextremely difficult to find enough independent instructions to filllatencies more than a few cycles, so the delta in code quality isminimal. It's fine to have the latency higher, but clamp the number ofcycles in the reservation.

Where are you getting pipeline information from? LLVM? The c908manuals, something from SpacemIT?

I don't see that you try to handle the dual vector integer ALUs.Essentially there's two 128 bit ALUs. Some operations can go into bothALUs, some are restricted to either ALU0 or ALU1. When operations arehandled in both a 256bit VLEN instruction has an observed latency of 1c(ie, something like a vadd with lmul1). When an operation is restrictedto just one of the ALUs, the observed latency is 2c because the data hasto be double-pumped (shifts I think fall into this category). At leastthat's my understanding of how the unit works.

I have no clue how the vector FP unit works but I'd be somewhatsurprised if under the hood it's also a 128bit data path with dual unitsto give the illusion of a 256bit data path with shorter latencies.


Jeff

Re: [PATCH] RISC-V: Add LMUL-aware RVV cost model for the Spacemit-X60 core

Reply via email to