### RFC This PR is based on the following RFC: https://discuss.tvm.ai/t/rfc-improve-quantized-convolution-performance-for-armv8-architectures/6920
### High level description of the submission The main algorithm lives in: * topi/python/topi/arm_cpu/conv2d_gemm.py(schedule) * topi/python/topi/arm_cpu/tensor_intrin.py (assembly+intrinsic) * topi/python/topi/arm_cpu/conv2d_int8.py(driver) The Weight transform touches different files (since it is computed at compile time): * topi/python/topi/arm_cpu/conv2d_alter_op.py * python/tvm/relay/op/nn/_nn.py * python/tvm/relay/op/nn/nn.py * src/relay/op/nn/convolution.h (relay node definition) * src/relay/op/nn/convolution.cc(relay node definition) * include/tvm/relay/attrs/nn.h (relay node definition) Strategies (mapping relay-node -> compute+schedules) are defined here: * python/tvm/relay/op/strategy/arm_cpu.py * python/tvm/relay/op/strategy/generic.py You can view, comment on, or merge this pull request online at: https://github.com/apache/incubator-tvm/pull/5754 -- Commit Summary -- * Improve quantized conv2d performance for armv8 -- File Changes -- M python/tvm/relay/op/nn/_nn.py (18) M python/tvm/relay/op/nn/nn.py (87) M python/tvm/relay/op/strategy/arm_cpu.py (52) M python/tvm/relay/op/strategy/generic.py (13) M python/tvm/relay/qnn/op/legalizations.py (10) M src/relay/op/nn/convolution.cc (80) M src/relay/op/nn/convolution.h (104) M topi/python/topi/arm_cpu/conv2d_alter_op.py (36) A topi/python/topi/arm_cpu/conv2d_gemm.py (143) M topi/python/topi/arm_cpu/conv2d_int8.py (31) M topi/python/topi/arm_cpu/tensor_intrin.py (327) M topi/python/topi/generic/nn.py (19) M topi/python/topi/nn/conv2d.py (25) -- Patch Links -- https://github.com/apache/incubator-tvm/pull/5754.patch https://github.com/apache/incubator-tvm/pull/5754.diff -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/apache/incubator-tvm/pull/5754