guberti opened a new pull request, #12856:
URL: https://github.com/apache/tvm/pull/12856

   For a while, I've intended to fix my depthwise_conv2d schedule so that its 
unique weight repacking scheme happens at compile time instead of during 
inference. While working on this though, I discovered that the `SMLAD` 
instruction we use to compute multiplications in parallel does not actually 
save time.
   
   Recall that the `SMLAD` instruction takes two `int16*2` values `x1::x2` and 
`y1::y2` and an accumulator `z`, and computes `z += x1 * y1 + x2 * y2`. For 
`NHWC` layouts however, the relevant `x1::x2` values in the input tensor are 
not next to each other, though. Previously, we used a DSP-specific halfword 
packing instruction `__PKHBT` to fix this, and then called `__SMLAD` after - 
two instructions for two multiplies. This is also what CMSIS-NN's [most 
optimized depthwise convolution code 
does](https://github.com/ARM-software/CMSIS_5/blob/dde5bac01b1b0b5ef528989a3139ce10bb1b054d/CMSIS/NN/Source/ConvolutionFunctions/arm_depthwise_conv_s8_opt.c#L319-L353).
   
   However, there is a lesser-known non-DSP instruction `SMLAxy` that is 
present on **all** Cortex-M cores (see 
[docs](https://developer.arm.com/documentation/dui0068/b/ARM-Instruction-Reference/ARM-multiply-instructions/SMLAxy)).
 This instruction allows us to only read one 16-bit half of an int32 register 
while performing multiply accumulates, allowing us to skip the `PKHBT` 
instruction. Doing the multiplies this way is just as fast, while being way 
more versatile and simpler.
   
   This PR changes the Cortex-M depthwise convolution setup to use `SMLAxy` 
instead of `SMLAD`. It also removes the 3x3 kernel restriction, and the 
complicated kernel packing mechanism. The net effect is that this schedule has 
gotten slightly faster (about 10% for 3x3 kernels) for kernels with an odd 
number of entries.
   
   This is still a draft PR, as I need to remove an issue where `topi.reshape` 
introduces redundant instructions.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to