masahi commented on code in PR #10450:
URL: https://github.com/apache/tvm/pull/10450#discussion_r848835556
##########
python/tvm/topi/cuda/tensorcore_alter_op.py:
##########
@@ -148,16 +148,18 @@ def _dense_legalize(attrs, inputs, arg_types):
# Pad input and output channels to use tensorcore schedule.
if dtype in ["float16", "int8", "uint8"]:
- # The shape of (M, K, N) must be multiple of (16, 16, 16) or (32, 16,
8) or (8, 16, 32)
+ # The shape of (M, K, N) must be multiple of
+ # (16, 16, 16) or (32, 16, 8) or (8, 16, 32) or (4, 4, 4)
if (
(M % 8 == 0 and K % 16 == 0 and N % 32 == 0)
or (M % 16 == 0 and K % 16 == 0 and N % 16 == 0)
or (M % 32 == 0 and K % 16 == 0 and N % 8 == 0)
+ or (M % 4 == 0 and K % 4 == 0 and N % 4 == 0)
):
# no need to pad
return None
- candidates = [(16, 16, 16), (32, 16, 8), (8, 16, 32)]
+ candidates = [(16, 16, 16), (32, 16, 8), (8, 16, 32), (4, 4, 4)]
Review Comment:
Can you try decoupling `(4, 4, 4)` padding from tensorcore stuff in this
file? The shape `(4, 4, 4)` is invalid for tensorcore.
For int8, I think rather than hard-coding `(4, 4, 4)` (which doesn't work
for other shape like 13), I think the right solution is to "pad to the nearest
multiple of 4 greater than the given dim".
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]