[GitHub] [tvm] masahi commented on a diff in pull request #10450: [ONNX] Add MatMulInteger importer

GitBox Tue, 12 Apr 2022 13:20:04 -0700


masahi commented on code in PR #10450:
URL: https://github.com/apache/tvm/pull/10450#discussion_r848835556



##########
python/tvm/topi/cuda/tensorcore_alter_op.py:
##########
@@ -148,16 +148,18 @@ def _dense_legalize(attrs, inputs, arg_types):
 
     # Pad input and output channels to use tensorcore schedule.
     if dtype in ["float16", "int8", "uint8"]:
-        # The shape of (M, K, N) must be multiple of (16, 16, 16) or (32, 16, 
8) or (8, 16, 32)
+        # The shape of (M, K, N) must be multiple of
+        # (16, 16, 16) or (32, 16, 8) or (8, 16, 32) or (4, 4, 4)
         if (
             (M % 8 == 0 and K % 16 == 0 and N % 32 == 0)
             or (M % 16 == 0 and K % 16 == 0 and N % 16 == 0)
             or (M % 32 == 0 and K % 16 == 0 and N % 8 == 0)
+            or (M % 4 == 0 and K % 4 == 0 and N % 4 == 0)
         ):
             # no need to pad
             return None
 
-        candidates = [(16, 16, 16), (32, 16, 8), (8, 16, 32)]
+        candidates = [(16, 16, 16), (32, 16, 8), (8, 16, 32), (4, 4, 4)]

Review Comment:
   Can you try decoupling `(4, 4, 4)` padding from tensorcore stuff in this 
file? The shape `(4, 4, 4)` is invalid for tensorcore.
   
   For int8, I think rather than hard-coding `(4, 4, 4)` (which doesn't work 
for other shape like 13), I think the right solution is to "pad to the nearest 
multiple of 4 greater than the given dim". 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi commented on a diff in pull request #10450: [ONNX] Add MatMulInteger importer

Reply via email to