guberti opened a new pull request, #12969:
URL: https://github.com/apache/tvm/pull/12969

   This pull request adds fast microTVM DSP schedules for the _optimal_ 
`conv2d` and `depthwise_conv2d` layouts - `NHWC/OHWI` and `NCHW/OIHW`. By 
letting us use the special `SMLAD` DSP instruction _without rearranging data_, 
**25%** of the instructions in the inner loop can be removed (for the `int16` 
case).
   
   Additionally, this change allows both the `conv2d` and `depthwise_conv2d` 
fast DSP schedules to use the _same underlying intrinsic_, a variation of a 
tensordot operator. This makes the code for these schedules much more compact. 
This PR also:
   
   - Adds unit tests for the new schedules
   - Does not affect the old schedules, which are still used when they apply
     - The cases in which these new fast schedules apply are strictly different
   - Adds support for `int8`, `int16`, and `int32` input data types in the new 
schedules
   - Adds a `change_constant_shape` utility to `tvm.topi`
   
   This pull request **does not** change the strategy functions to call these 
new fast schedules - this will occur in a follow up PR. I've also written a 
comment below delving into why these layouts are optimal.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to