Hi @FrozenGene Just to clarify: I am enjoying the discussion, and since the optimization space is wild, I agree that is worth valuating different approaches. * About the Raspberry+mobilenet v2, good to know you are working on Armv8-A (sorry to have assumed otherwise). However, there is still the point that mobilenet uses shallow convolutions, while I am addressing deeper and more generic convolutions. * Are you saying that, as things stand now in TVM, the `conv2d_nhwc_spatial_pack` schedule might be faster than the gemm approach on smaller CPUs? Unfortunately, for now I don't think they can be added together because of what I said above about the legalization step. Do you know any work-around to that? Maybe I can legalize only for specific devices (e.g., only for Cortex-A55)? * Finally, as things stand now we might get this PR in, and later do a more detailed comparison across different networks + CPUs
-- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/pull/5754#issuecomment-642564985