anwang2009 commented on code in PR #10450:
URL: https://github.com/apache/tvm/pull/10450#discussion_r848938955


##########
python/tvm/topi/cuda/tensorcore_alter_op.py:
##########
@@ -170,7 +172,19 @@ def _dense_legalize(attrs, inputs, arg_types):
 
     if extra_flops_ratio > 2:
         logger.info("dense pad_to_tensorcore skipped, extra_flops_ratio %s", 
extra_flops_ratio)
-        return None
+
+        # If tensorcore schedule padding fails, pad to nearest upward 4x4x4 as 
long as
+        # the additional flops ratio isn't double or more.
+        # Note that 4x4x4 is invalid for tensorcore scheduling, but padding 
upwards to 4x4x4
+        # doesn't hurt if tensorcore padding has already failed.
+        if M % 4 == 0 and K % 4 == 0 and N % 4 == 0:
+            # No need to pad
+            return None
+        (dm, dk, dn) = _pad_to(M, K, N, (4, 4, 4))
+        extra_flops_ratio = _extra_flops(M, K, N, dm, dk, dn) / (M * K * N)
+
+        if extra_flops_ratio > 2:
+            return None

Review Comment:
   @masahi I moved the 4x4x4 treatment to this block. Is this what you meant or 
did you want the 4x4x4 logic outside of `tensorcore_alter_op.py` entirely?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to