Glad to see we have the same thought we should let autotvm select the best.
Autoscheduler reley on the legalization pass to generate smlal inst(After auto scheduler is released, let us make it better together.) One information I missed before, my testing rasp 3b+ os is Ubuntu 64 bits, not 32 bits, so the target is aarch64 too. I mention auto scheduler is not to question your work (your work is very great!) and is orthogonal as you said. I just mention that we use smlal inst on A53 (aarch64 os mentioned before) we could get nice performance too. So I want to know on low-end arm cpu, whether smlal is better than this (as fb qnnpack blog said: The default microkernel uses the fewest possible instructions and thus delivers the best performance on low-end cores, which can execute only one NEON instruction per cycle.). So I wish we could test several arm cpus to proove our this work work well all aarch64 cores (low-end core, high-end core). Secondly, I suggest let us test mobilenet v2 too. To see that whether our pr could work well across various models. Your work is very great but I wish let us use more data and result to make it more convincing. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642541198