apeskov opened a new pull request, #14345:
URL: https://github.com/apache/tvm/pull/14345

   Goal is to allow to apply intrinsics "q_multiply_shift" and 
"q_multiply_shift_per_axis" for vector type i32x128. Originally it supports 
only "i32x32" which is natively supported by platform (1024 bit vector).
   
   **Motivation** 
   There are situation than we have to use vector size slightly more than 
supported by platform. As example consider sequence of element-wise operators: 
add<i32> -> q_multiply_shift<i32> -> cast<i8>. To achieve performance we have 
to squash it into one single loop (`sch.compute_at(...)`). First two operators 
would like to be vectorised with using data type "int32x32". last one cast 
operator want to use i32x128 as src and i8x128 as dst. As result we have to 
adapt all this operator to accept vector size "??x128" to successfully 
vectorise entire loop.
   
   This change allows to achieve significant performance speedup for tuning 
tasks like `conv -> add -> qnn.requantize -> cast_i8`. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to