Anndrey24 opened a new pull request, #15648: URL: https://github.com/apache/tvm/pull/15648
The Legalize pass was unnecessarily padding the input channels for conv2d int8 native implementations. Since the conv2d schedule itself can add padding more efficiently, I skipped padding during the pass and further optimised the schedule to deal with it. For the int8 interleaved implementation, I kept the Legalize pass padding, which transforms the input channels into a multiple of 8, and modified the schedule to ensure vectorization of the input data. I also added a test to check whether or not the Legalize pass pads the conv2d input data. The benchmark results of a single int8 conv2d operation where input padding along the K axis is necessary, with input = (1 x 224 x 224 x 3), filter = (16 x 3 x 3 x 3), output = (1 x 112 x 112 x 16), tested for 3 different targets before and after the changes, is given below: - generic = `"llvm --device=arm_cpu --mtriple=aarch64-linux-gnu -mattr=+v8.2a"` (interleaved) - dotprod = `"llvm --device=arm_cpu --mtriple=aarch64-linux-gnu -mattr=+v8.2a,+dotprod"` (native) - i8mm = `"llvm --device=arm_cpu --mtriple=aarch64-linux-gnu -mattr=+v8.2a,+i8mm"` (interleaved) | Target | Before (ms) | After (ms) | Speedup (%) | |---------|-------------|------------|-------------| | generic | 0.13 | 0.1154 | 11.23 | | dotprod | 0.5186 | 0.0663 | 87.22 | | i8mm | 0.0982 | 0.0829 | 15.58 | cc @ekalda @neildhickey @lhutton1 @leandron -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
