> This patch introduces a vector cost model for the Spacemit-X60 core, > using dynamic LMUL scaling with the -madjust-lmul-cost flag. > > Compared to the previous patch, I dropped the local 'vector_lmul' > attribute and the corresponding LMUL-aware cost logic in spacemit-x60.md. > Instead, Spacemit-X60 tuning now enables -madjust-lmul-cost implicitly, > and riscv_sched_adjust_cost is updated so that the adjustment applies to > spacemit_x60 in addition to the generic out-of-order model. > > The stress tests I previously used to tune individual instruction costs > (with the LMUL-aware logic implemented directly in spacemit-x60.md) > now show a regression in performance. The most likely cause is the implicit > -madjust-lmul-cost scaling, since some instructions performed better > with non-power-of-two scaling (or with no LMUL scaling at all), so the > uniform ×(1,2,4,8) adjustment affects performance. > > Updated performance results: > > | Benchmark | Metric | Trunk | Vector Cost Model | Δ (%) | > |------------------|--------|------------------|-------------------|---------| > | SciMark2-C | cycles | 311,450,555,453 | 313,278,899,107 | +0.56% | > |------------------|--------|------------------|-------------------|---------| > | tramp3d-v4 | cycles | 23,788,980,247 | 21,073,526,428 | -12.89% | > |------------------|--------|------------------|-------------------|---------| > | Freebench/neural | cycles | 471,707,641 | 435,842,612 | -8.23% | > |------------------|--------|------------------|-------------------|---------|
> > Benchmarks were run from the LLVM test-suite > (MultiSource/Benchmarks) using: > > taskset -c 0 perf stat -r 10 ./... How sure are we about these results? It has been notoriously difficult to obtain reliable benchmark numbers on the BPI. Do the results hold after a reboot or on the next day? What about an even higher number of iterations? I find it difficult to understand why two benchmarks improve a lot more and one regresses. If the LMUL scaling is incorrect, wouldn't we expect similar behavior for all three? Or does SciMark have a different footprint WRT instructions and e.g. uses some insns more for which the uniform scaling doesn't hold? -- Regards Robin
