> 1. It will be hard to do this. The point is that the legalization is done in > Relay before picking the strategy (thus, it is unaware of the strategy > picked). To keep both legalizations I need somehow to pass information from > the strategy (e.g., the name of the algorithm, or something like that). Are > you aware of any other ways I can do it?
@giuseros I think add the algorithm name could be one way to handle it. For example, we could add it in the `attr` and query it in the legalization pass, then we could throw it safely. > Note that I am targeting NHWC layout. I wasn't able to even compile with > conv2d_nhwc_spatial_pack for uint8 (it just hangs, at least when I tried it > without auto-tuning on Armv8-A). I gathered from various discussions that > NHWC support for arm targets is incomplete at the moment. So for now we might > simply agree to leave this as default for NHWC and conv2d_nchw_spatial_pack > as default for NCHW and mirror that in the legalization step which might look > like: ```if is_aarch64_arm() and attrs.data_layout == "NHWC": return helper_change_dtypes_to_be_same(attrs, inputs, types, relay.qnn.op.conv2d) return helper_no_fast_int8_hw_legalization(attrs, inputs, types, relay.qnn.op.conv2d)``` Yes, our NHWC schedule on arm cpu doesn't be complete. After our careful testing, NHWC is also perform better than NCHW on arm cpu using Ansor (aka auto scheduler) too. So this prompts us we could improve our AutoTVM NHWC schedule on arm cpu too. As the result I show in the post, we use auto schedule is to leverage NHWC layout and `smlal` instruction, I prefer we could leverage `attr[algorithm_name]` mentioned previous to keep `smlal` instruction. After auto scheduler released (we are working hard to do it, we wish after 2 weeks we could bring it in), we could see how to improve it (like generating smlal and smlal2 or your tensorize instruction), they are orthogonal but they share the same legalize pass. One background of auto scheduler: In auto scheduler, we only need tvm.compute, then we could generate schedule automatically, so we could try NHWC / NCHW easily. So there is no spatial pack schedule template concept in the auto scheduler world in fact. > About the Raspberry+mobilenet v2, good to know you are working on Armv8-A > (sorry to have assumed otherwise). However, there is still the point that > mobilenet uses shallow convolutions, while I am addressing deeper and more > generic convolutions. So we should keep both algorithm better, right? > Are you saying that, as things stand now in TVM, the conv2d_nhwc_spatial_pack > schedule might be faster than the gemm approach on smaller CPUs? > Unfortunately, for now I don't think they can be added together because of > what I said above about the legalization step. Do you know any work-around to > that? Maybe I can legalize only for specific devices (e.g., only for > Cortex-A55)? I think add algorithm name mentioned before maybe could help to solve it. > Finally, as things stand now we might get this PR in, and later do a more > detailed comparison across different networks + CPUs Ok. I buy it in. After legalization pass we discussed is solved, I am glad to do code review carefully and handle this pr. -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642577581