FrozenGene edited a comment on pull request #6095:
URL: https://github.com/apache/incubator-tvm/pull/6095#issuecomment-661756915


   > ACL implementation
   
   Hi @giuseros Thanks for the work. I fully understand your purpose and 
smoothy development path. As this schedule will be the default NHWC depthwise 
convolution, my opinion is we should try to achieve a good performance as far 
as we could achieve. Notably I don't mean we mush achieve like ACL ultimate 
performance then we could merge, optimization is not one-shot deal. But here I 
think we could enable auto tvm to help us to achieve better performance. I 
think it is worthy introducing into this pr.
   
   - This schedule will be applied for arm32 and arm64 both, we shouldn't only 
consider arm64. So auto tvm (`split`) could help us to avoid this issue.
   
   - Tuning knob of `compute_at` (especially `data_pad`)  could help us solve 
`parallel-compute-locality` issue (we can not assume we only run kernel only in 
one single core). see more detail: 
http://people.csail.mit.edu/jrk/halide-pldi13.pdf Figure 2
   
   I agree we should reduce tuning knob and improve tuning time experience, but 
if it could help us improve performance, I think we should introduce it in, 
otherwise we could avoid it.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to