cbalint13 commented on issue #17625:
URL: https://github.com/apache/tvm/issues/17625#issuecomment-2648825520
H@JieGH ,
> Hi @cbalint13, Thanks for the advice. I now have a method for choosing
VLEN, which is using an additional flag: ` tvm.target.Target("llvm -jit=orcjit
-mtriple=riscv64 -mcpu=spacemit-x60 -
mattr=+64bit,+m,+a,+f,+d,+c,+zfh,+v,+zvl256b")`
>
> By specifying zvl256b flags, it means enable 'Zvl' (Minimum Vector Length)
256. This indeed has an impact on the execution's performance. However,
Yes, another way to tell LLVM the VLEN is via the canonical flags, but we
also need TVM itself to be aware of this.
> 1. the TVM still warns the 128bit sets as default bit length despite the
zvl flags having been enabled and having an impact on performance. I have not
yet run the latest update you posted at Handle vector width (VLEN) for RISCV
arches [Handle vector width (VLEN) for RISCV archesĀ
#17631](https://github.com/apache/tvm/pull/17631)
You have an older LLVM, and it does not know about ```-mcpu=spacemit-x60```,
so it will fall as a ```generic```.
* Can check llvm version from tvm side:
```
$ python3 -c "import tvm; print(tvm.target.codegen.llvm_version_major())"
20
```
* Can also look inside riscv64 what ```-mcpu``` options are there for your
installed LLVM:
```
$ python -c "import tvm;
print(tvm.target.codegen.llvm_get_cpu_archlist(tvm.target.Target('llvm
-mtriple=riscv64--')))"
["generic", "generic-rv32", "generic-rv64", "mips-p8700", "rocket",
"rocket-rv32", "rocket-rv64", "rp2350-hazard3", "sifive-7-series",
"sifive-e20", "sifive-e21", "sifive-e24", "sifive-e31", "sifive-e34",
"sifive-e76", "sifive-p450", "sifive-p470", "sifive-p670", "sifive-s21",
"sifive-s51", "sifive-s54", "sifive-s76", "sifive-u54", "sifive-u74",
"sifive-x280", "spacemit-x60", "syntacore-scr1-base",
"syntacore-scr1-max",
"syntacore-scr3-rv32", "syntacore-scr3-rv64", "syntacore-scr4-rv32",
"syntacore-scr4-rv64", "syntacore-scr5-rv32", "syntacore-scr5-rv64",
"syntacore-scr7", "tt-ascalon-d8", "veyron-v1", "xiangshan-nanhu"]
```
The flags (older LLVM) would be: *llvm -device=riscv_cpu -vector-width=256
-mtriple=riscv64-linux-gnu -mcpu=generic-rv64 -mattr=+64bit,+a,+c,+d,+f,+m,+v*
(orcjit is already default, vector-with informs booth TVM and LLVM).
> 2. I searched zvl flags for a given matrix mul problem; I changed the zvl
and measured the performance. The best performance appeared at the vector
length that the chip should not support. For example if I set zvl256b, the
execution takes 491ms to complete, if I set zvl8192b, the execution takes 384
ms to finish, which has over 20% speed up. There are something wrong here.>
> Any comments on this? Thanks.
Performance also depends on how LLVM optimizes things out, TVM have no
highly-specialized optimizations for RISCV.
TVM emmits candidates/iterations as intermediate proposals (in auto-tunnig
flow) and forwards to LLVM, while electing only the best performing ones. Not
sure if you are also tring to tune your model/function, but without a tuning
process TVM likely emits a subperforming variant, even for a simple matmul
operation, there should be a warn on this:
```
WARNING:autotvm:One or more operators have not been tuned.
Please tune your model for better performance. Use DEBUG logging level to
see more details.
```
The work done in https://github.com/apache/tvm/pull/17631 only informs TVM
about VLEN intentions from LLVM side.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]