guberti opened a new pull request, #12856: URL: https://github.com/apache/tvm/pull/12856
For a while, I've intended to fix my depthwise_conv2d schedule so that its unique weight repacking scheme happens at compile time instead of during inference. While working on this though, I discovered that the `SMLAD` instruction we use to compute multiplications in parallel does not actually save time. Recall that the `SMLAD` instruction takes two `int16*2` values `x1::x2` and `y1::y2` and an accumulator `z`, and computes `z += x1 * y1 + x2 * y2`. For `NHWC` layouts however, the relevant `x1::x2` values in the input tensor are not next to each other, though. Previously, we used a DSP-specific halfword packing instruction `__PKHBT` to fix this, and then called `__SMLAD` after - two instructions for two multiplies. This is also what CMSIS-NN's [most optimized depthwise convolution code does](https://github.com/ARM-software/CMSIS_5/blob/dde5bac01b1b0b5ef528989a3139ce10bb1b054d/CMSIS/NN/Source/ConvolutionFunctions/arm_depthwise_conv_s8_opt.c#L319-L353). However, there is a lesser-known non-DSP instruction `SMLAxy` that is present on **all** Cortex-M cores (see [docs](https://developer.arm.com/documentation/dui0068/b/ARM-Instruction-Reference/ARM-multiply-instructions/SMLAxy)). This instruction allows us to only read one 16-bit half of an int32 register while performing multiply accumulates, allowing us to skip the `PKHBT` instruction. Doing the multiplies this way is just as fast, while being way more versatile and simpler. This PR changes the Cortex-M depthwise convolution setup to use `SMLAxy` instead of `SMLAD`. It also removes the 3x3 kernel restriction, and the complicated kernel packing mechanism. The net effect is that this schedule has gotten slightly faster (about 10% for 3x3 kernels) for kernels with an odd number of entries. This is still a draft PR, as I need to remove an issue where `topi.reshape` introduces redundant instructions. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
