Anndrey24 opened a new pull request, #15484:
URL: https://github.com/apache/tvm/pull/15484

   `topi.arm_cpu.schedule_conv2d_NHWC_quantized_interleaved` was failing 
compilation with the `+i8mm` extension enabled (as done in apache/tvm#14888) 
whenever the output height and output width were both equal to 1, such that OH 
x OW = 1.
   
   Padding was being removed during the `tir.BufferShapeLegalize` pass, causing 
an error in the `tir.BufferBindUnwrapper` pass. Some of the removed padding was 
necessary for tensorize (using the `gemm_acc_2x2_int8_int8_int32` intrinsic), 
which expects 2x2 output tiles. However, because of the optimisations mentioned 
above, the output tensor `C_interleaved` was reduced to having 1x2 tiles 
instead.
   
   e.g. for A = [1x1x1x8], W = [1x1x8x24], C = [1x1x1x24]:
   - Before fix: `C_interleaved = T.Buffer((1, 1, 2, 1, 6, 1, 2), "int32”)`
   - After fix: `C_interleaved = T.Buffer((1, 1, 2, 1, 6, 2, 2), "int32”)`
   
   To make sure the required padding is left untouched, while the rest of it is 
still removed, a dummy reference to the needed axis is declared.
   
   In the end, the leftover padding is still disregarded when computing the 
final output tensor `C`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to