FrozenGene commented on pull request #5754:
URL: https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642577581
> 1. It will be hard to do this. The point is that the legalization is done
in Relay before picking the strategy (thus, it is unaware of the strategy
picked). To keep both legalizations I need somehow to pass information from the
strategy (e.g., the name of the algorithm, or something like that). Are you
aware of any other ways I can do it?
@giuseros I think add the algorithm name could be one way to handle it. For
example, we could add it in the `attr` and query it in the legalization pass,
then we could throw it safely.
> Note that I am targeting NHWC layout. I wasn't able to even compile with
conv2d_nhwc_spatial_pack for uint8 (it just hangs, at least when I tried it
without auto-tuning on Armv8-A). I gathered from various discussions that NHWC
support for arm targets is incomplete at the moment. So for now we might simply
agree to leave this as default for NHWC and conv2d_nchw_spatial_pack as default
for NCHW and mirror that in the legalization step which might look like:
```if is_aarch64_arm() and attrs.data_layout == "NHWC":
return helper_change_dtypes_to_be_same(attrs, inputs, types,
relay.qnn.op.conv2d)
return helper_no_fast_int8_hw_legalization(attrs, inputs, types,
relay.qnn.op.conv2d)```
Yes, our NHWC schedule on arm cpu doesn't be complete. After our careful
testing, NHWC is also perform better than NCHW on arm cpu using Ansor (aka auto
scheduler) too. So this prompts us we could improve our AutoTVM NHWC schedule
on arm cpu too. As the result I show in the post, we use auto schedule is to
leverage NHWC layout and `smlal` instruction, I prefer we could leverage
`attr[algorithm_name]` mentioned previous to keep `smlal` instruction. After
auto scheduler released (we are working hard to do it, we wish after 2 weeks we
could bring it in), we could see how to improve it (like generating smlal and
smlal2 or your tensorize instruction), they are orthogonal but they share the
same legalize pass.
One background of auto scheduler: In auto scheduler, we only need
tvm.compute, then we could generate schedule automatically, so we could try
NHWC / NCHW easily. So there is no spatial pack schedule template concept in
the auto scheduler world in fact.
> About the Raspberry+mobilenet v2, good to know you are working on Armv8-A
(sorry to have assumed otherwise). However, there is still the point that
mobilenet uses shallow convolutions, while I am addressing deeper and more
generic convolutions.
So we should keep both algorithm better, right?
> Are you saying that, as things stand now in TVM, the
conv2d_nhwc_spatial_pack schedule might be faster than the gemm approach on
smaller CPUs? Unfortunately, for now I don't think they can be added together
because of what I said above about the legalization step. Do you know any
work-around to that? Maybe I can legalize only for specific devices (e.g., only
for Cortex-A55)?
I think add algorithm name mentioned before maybe could help to solve it.
> Finally, as things stand now we might get this PR in, and later do a more
detailed comparison across different networks + CPUs
Ok. I buy it in. After legalization pass we discussed is solved, I am glad
to do code review carefully and handle this pr.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]