Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

Zhao Wu Thu, 11 Jun 2020 02:53:15 -0700

Glad to see we have the same thought we should let autotvm select the best.


Autoscheduler reley on the legalization pass to generate smlal inst(After auto 
scheduler is released, let us make it better together.) One information I 
missed before, my testing rasp 3b+ os is Ubuntu 64 bits, not 32 bits, so the 
target is aarch64 too. 

I mention auto scheduler is not to question your work (your work is very 
great!) and is orthogonal as you said. I just mention that we use smlal inst on 
A53 (aarch64 os mentioned before) we could get nice performance too. So I want 
to know on low-end arm cpu, whether smlal is better than this (as fb qnnpack 
blog said: The default microkernel uses the fewest possible instructions and 
thus delivers the best performance on low-end cores, which can execute only one 
NEON instruction per cycle.).

So I wish we could test several arm cpus to proove our this work work well all 
aarch64 cores (low-end core, high-end core).

Secondly, I suggest let us test mobilenet v2 too. To see that whether our pr 
could work well across various models. 

Your work is very great but I wish let us use more data and result to make it 
more convincing. 


-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642541198

Re: [apache/incubator-tvm] [RFC] Improve quantized convolution performance for armv8 architectures (#5754)

Reply via email to