ekalda opened a new pull request, #13669: URL: https://github.com/apache/tvm/pull/13669
topi.arm_cpu.schedule_conv2d_NHWC_quantized_native was failing compilation in case the input channels divided by 4 was less than 4. This was because we were splitting this axis by a factor of 4 to create appropriate loop nest for tensorize, but then tensorize was assuming that the outer axis bound was divisible by 4. If the outer bound was less than 4, compilation failed, if it was greater than 4 but not divisible by 4, we were occasionally accessing data outside of tensor, which luckily was padded due to alignment (I think). So here we make sure that we explicitly pad the input axis such that the outer loop will always be divisible by 4. There are also some refactors to test_topi_conv2d_int8.py: - decouple the tests using pytest.parametrize - extend the NHWC int8 schedules test to test against arm targets and various schedules. When these schedules were initialy added, we didn't have Arm CI, so only compilation was tested, now we can also run the workloads on Arm targets. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
