giuseros commented on pull request #6095: URL: https://github.com/apache/incubator-tvm/pull/6095#issuecomment-661738954
Hi @FrozenGene , Before introducing tuning knobs, I wanted to do first an analysis to find the minimum set of tuning parameters to bring the best performance. The aim is to reduce the tuning time. The point is that sometimes we are constrained by the number of registers available in AArch64, so trying out different splits might only increase the tuning time, without giving any benefit. So the idea was to have a "default" schedule which mimics [ACL implementation](https://github.com/ARM-software/ComputeLibrary/blob/master/src/core/NEON/kernels/convolution/depthwise/impl_qa8_qa8.hpp#L292-L314) and then introduce (the minimal set of) tuning knobs + tensorization to speed things up. What do you think? If you want to add tuning knobs in this PR, I will try to do the tuning analysis today ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
