vvchernov commented on code in PR #13621:
URL: https://github.com/apache/tvm/pull/13621#discussion_r1057084844


##########
python/tvm/relay/op/strategy/x86.py:
##########
@@ -627,16 +627,16 @@ def batch_matmul_strategy_cpu(attrs, inputs, out_type, 
target):
     if (
         not attrs.transpose_a
         and attrs.transpose_b
-        and target_has_vnni(mcpu)
+        and target_has_avx512(mcpu)
         and inputs[0].dtype == "uint8"
         and inputs[1].dtype == "int8"
         and inputs[1].shape[-2] % 16 == 0
         and inputs[1].shape[-1] % 4 == 0
     ):
         strategy.add_implementation(
-            wrap_compute_batch_matmul(topi.x86.batch_matmul_vnni_compute, 
need_out_dtype=True),
-            wrap_topi_schedule(topi.x86.schedule_batch_matmul_vnni),
-            name="batch_matmul_vnni.x86",

Review Comment:
   Hello @cbalint13! Thank you for your nits and remarks! In this case VNNI was 
not removed but extended, as you know VNNI is a part of AVX512 architectures. 
The fork is here:
   
https://github.com/apache/tvm/blob/main/python/tvm/topi/x86/tensor_intrin.py#:~:text=def%20dot_16x1x16_uint8_int8_int32()%3A,return%20dot_16x1x16_uint8_int8_int32_skylake()
   As you correctly remarked avx2 and ssse3 are also processed here, but they 
are not accessable due to high-level check target_has_avx512. Possibly you 
suggestion is good way how to resolve it further. Now I only extended existed 
approach for avx512.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to