I assume you don't have your own NPU accelerate library, so you couldn't go TVM 
BYOC.

Firstly, you should implement your own quantization algorithm based on your NPU 
(not all operation / data type could be provided on your NPU, like int64)

Secondly, you should consider provide your own relay graph passes for better 
support on your chip (for example, your NPU have some restrict on operator 
support / your NPU have own data layout support and so on)

Thirdly, you should implement your NPU's TVM passes and (maybe) your own 
schedule primitive (for example your NPU own memory memory hierarchy) 

Fourthly, you should complete code generation. You mention you don't have LLVM 
BE currently. All right, you should consider implement your own code generation 
for TIR (for example emit assembly instruction directly).

Fifthly, you should communicate with NPU driver team and co-design API so that 
we know how to load the compiled binary. This will impact how we design or 
change TVM runtime component to suit for your NPU.





---
[Visit Topic](https://discuss.tvm.apache.org/t/add-new-backend-to-tvm/10373/2) 
to respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.apache.org/email/unsubscribe/7efe9823188eb0c4b5bf2b96aeb3e2459c0b9e44769c0616fdee1b99b576ffba).

Reply via email to