I assume you don't have your own NPU accelerate library, so you couldn't go TVM BYOC.
Firstly, you should implement your own quantization algorithm based on your NPU (not all operation / data type could be provided on your NPU, like int64) Secondly, you should consider provide your own relay graph passes for better support on your chip (for example, your NPU have some restrict on operator support / your NPU have own data layout support and so on) Thirdly, you should implement your NPU's TVM passes and (maybe) your own schedule primitive (for example your NPU own memory memory hierarchy) Fourthly, you should complete code generation. You mention you don't have LLVM BE currently. All right, you should consider implement your own code generation for TIR (for example emit assembly instruction directly). Fifthly, you should communicate with NPU driver team and co-design API so that we know how to load the compiled binary. This will impact how we design or change TVM runtime component to suit for your NPU. --- [Visit Topic](https://discuss.tvm.apache.org/t/add-new-backend-to-tvm/10373/2) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/7efe9823188eb0c4b5bf2b96aeb3e2459c0b9e44769c0616fdee1b99b576ffba).
