Wheest opened a new pull request #6137: URL: https://github.com/apache/incubator-tvm/pull/6137
This pull request is to replace the current grouped direct convolution algorithm on x86 and Arm targets, with the faster Grouped Spatial Pack Convolutions (GSPC) algorithm. Here's a performance comparison graph for ResNet34 on a single big core of a Hikey 970 as we increase the number of groups:  Note that in the untuned case the current depthwise convolution outperforms GSPC, thus I have omitted it from the pull request. This is my first proper full request to TVM, so I may be have some issues I haven't spotted, or style problems. In short, this commit adds identical GSPC compute definitions and schedules for x86 and arm_cpu targets for grouped convolutions, as well as updating the Relay operator strategy for each. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
