Anndrey24 opened a new pull request, #15648:
URL: https://github.com/apache/tvm/pull/15648

   The Legalize pass was unnecessarily padding the input channels for conv2d 
int8 native implementations. Since the conv2d schedule itself can add padding 
more efficiently, I skipped padding during the pass and further optimised the 
schedule to deal with it.  
   For the int8 interleaved implementation, I kept the Legalize pass padding, 
which transforms the input channels into a multiple of 8, and modified the 
schedule to ensure vectorization of the input data.  
   I also added a test to check whether or not the Legalize pass pads the 
conv2d input data.  
   
   The benchmark results of a single int8 conv2d operation where input padding 
along the K axis is necessary, with input = (1 x 224 x 224 x 3), filter = (16 x 
3 x 3 x 3), output = (1 x 112 x 112 x 16), tested for 3 different targets 
before and after the changes, is given below:
   - generic = `"llvm --device=arm_cpu --mtriple=aarch64-linux-gnu 
-mattr=+v8.2a"` (interleaved)
   - dotprod = `"llvm --device=arm_cpu --mtriple=aarch64-linux-gnu 
-mattr=+v8.2a,+dotprod"` (native)
   - i8mm = `"llvm --device=arm_cpu --mtriple=aarch64-linux-gnu 
-mattr=+v8.2a,+i8mm"` (interleaved)
   
   | Target  | Before (ms) | After (ms) | Speedup (%) |
   |---------|-------------|------------|-------------|
   | generic | 0.13        | 0.1154     | 11.23       |
   | dotprod | 0.5186      | 0.0663     | 87.22       |
   | i8mm | 0.0982      | 0.0829     | 15.58       |
   
   cc @ekalda @neildhickey @lhutton1 @leandron


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to