ekalda opened a new pull request, #13669:
URL: https://github.com/apache/tvm/pull/13669

   topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation 
in case the input channels divided by 4 was less than 4.
   
   This was because we were splitting this axis by a factor of 4 to create 
appropriate loop nest for tensorize, but then tensorize was assuming that the 
outer axis bound was divisible by 4.
   
   If the outer bound was less than 4, compilation failed, if it was greater 
than 4 but not divisible by 4, we were occasionally accessing data outside of 
tensor, which luckily was padded due to alignment (I think).
   
   So here we make sure that we explicitly pad the input axis such that the 
outer loop will always be divisible by 4.
   
   There are also some refactors to test_topi_conv2d_int8.py:
   - decouple the tests using pytest.parametrize
   - extend the NHWC int8 schedules test to test against arm targets and 
various schedules. When these schedules were initialy added, we didn't have Arm 
CI, so only compilation was tested, now we can also run the workloads on Arm 
targets.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to