apeskov opened a new pull request, #14345: URL: https://github.com/apache/tvm/pull/14345
Goal is to allow to apply intrinsics "q_multiply_shift" and "q_multiply_shift_per_axis" for vector type i32x128. Originally it supports only "i32x32" which is natively supported by platform (1024 bit vector). **Motivation** There are situation than we have to use vector size slightly more than supported by platform. As example consider sequence of element-wise operators: add<i32> -> q_multiply_shift<i32> -> cast<i8>. To achieve performance we have to squash it into one single loop (`sch.compute_at(...)`). First two operators would like to be vectorised with using data type "int32x32". last one cast operator want to use i32x128 as src and i8x128 as dst. As result we have to adapt all this operator to accept vector size "??x128" to successfully vectorise entire loop. This change allows to achieve significant performance speedup for tuning tasks like `conv -> add -> qnn.requantize -> cast_i8`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
