FrozenGene commented on pull request #5754:
URL: https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642577581


   > 1. It will be hard to do this. The point is that the legalization is done 
in Relay before picking the strategy (thus, it is unaware of the strategy 
picked). To keep both legalizations I need somehow to pass information from the 
strategy (e.g., the name of the algorithm, or something like that). Are you 
aware of any other ways I can do it?
   
   @giuseros I think add the algorithm name could be one way to handle it. For 
example, we could add it in the `attr` and query it in the legalization pass, 
then we could throw it safely.
   
   > Note that I am targeting NHWC layout. I wasn't able to even compile with 
conv2d_nhwc_spatial_pack for uint8 (it just hangs, at least when I tried it 
without auto-tuning on Armv8-A). I gathered from various discussions that NHWC 
support for arm targets is incomplete at the moment. So for now we might simply 
agree to leave this as default for NHWC and conv2d_nchw_spatial_pack as default 
for NCHW and mirror that in the legalization step which might look like:
       ```if is_aarch64_arm() and attrs.data_layout == "NHWC":
           return helper_change_dtypes_to_be_same(attrs, inputs, types, 
relay.qnn.op.conv2d)
       return helper_no_fast_int8_hw_legalization(attrs, inputs, types, 
relay.qnn.op.conv2d)```
   
   Yes, our NHWC schedule on arm cpu doesn't be complete. After our careful 
testing, NHWC is also perform better than NCHW on arm cpu using Ansor (aka auto 
scheduler) too. So this prompts us we could improve our AutoTVM NHWC schedule 
on arm cpu too.  As the result I show in the post,  we use auto schedule is to 
leverage NHWC layout and `smlal` instruction,  I prefer we could leverage 
`attr[algorithm_name]` mentioned previous to keep `smlal` instruction. After 
auto scheduler released (we are working hard to do it, we wish after 2 weeks we 
could bring it in), we could see how to improve it (like generating smlal and 
smlal2 or your tensorize instruction), they are orthogonal but they share the 
same legalize pass.
   
   One background of auto scheduler: In auto scheduler, we only need 
tvm.compute, then we could generate schedule automatically, so we could try 
NHWC / NCHW easily. So there is no spatial pack schedule template concept in 
the auto scheduler world in fact.
   
   > About the Raspberry+mobilenet v2, good to know you are working on Armv8-A 
(sorry to have assumed otherwise). However, there is still the point that 
mobilenet uses shallow convolutions, while I am addressing deeper and more 
generic convolutions.
   
   So we should keep both algorithm better, right?
   
   
   > Are you saying that, as things stand now in TVM, the 
conv2d_nhwc_spatial_pack schedule might be faster than the gemm approach on 
smaller CPUs? Unfortunately, for now I don't think they can be added together 
because of what I said above about the legalization step. Do you know any 
work-around to that? Maybe I can legalize only for specific devices (e.g., only 
for Cortex-A55)?
   
   I think add algorithm name mentioned before maybe could help to solve it.
   
   > Finally, as things stand now we might get this PR in, and later do a more 
detailed comparison across different networks + CPUs
   
   Ok. I buy it in. After legalization pass we discussed is solved, I am glad 
to do code review carefully and handle this pr.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to