Lunderberg commented on PR #16183:
URL: https://github.com/apache/tvm/pull/16183#issuecomment-2273924915

   First results, looks like the implicit conversions aren't necessarily 
causing the overhead.  I benchmarked constructing a TIR buffer, with the 
shape/strides either provided as a 
   
   On `main` (after the FFI changes), the runtime is about the same, regardless 
of how the shape/strides are provided.
   
![image](https://github.com/user-attachments/assets/d4f505ab-ca46-4591-9a53-8aaa519fe71c)
   
   On `5a67a00bcb` (last commit before the FFI changes), it's about 80% slower 
's about a 40% overhead when providing python integers (`[1,2,3]`) as compared 
to providing TIR IntImm (`[T.int32(1), T.int32(2), T.int32(3)]`).
   
   
![image](https://github.com/user-attachments/assets/57312ae8-7615-4de9-9b5b-2db1e1995fe2)
   
   This is rather surprising, because I expected the overhead to be present 
after the FFI changes, but this benchmark suggests that there's less overhead.
   
   Next up, seeing how the FFI changes impact Relax `R.call_tir` operators.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to