masahi commented on a change in pull request #9419:
URL: https://github.com/apache/tvm/pull/9419#discussion_r740677582



##########
File path: python/tvm/contrib/cutlass/gen_gemm.py
##########
@@ -335,6 +347,16 @@ def check_align(self, op_name, M):
             return False
         return True
 
+    def get_default(self, out_dtype):
+        """Return the default kernel for the requested architecture.
+        For now, the default kernel was picked arbitrary.
+        """
+        ops = GENERATOR_FUNC_TABLE[self.sm](out_dtype)
+        default_kernel_name = DEFAULT_KERNELS[self.sm][out_dtype]

Review comment:
       Added an additional check in the constructor 
https://github.com/apache/tvm/blob/6fb9e98090733ed5243a2cca10910cebfab83a76/python/tvm/contrib/cutlass/gen_gemm.py#L339
   
   

##########
File path: python/tvm/contrib/cutlass/build.py
##########
@@ -105,9 +131,19 @@ def tune_cutlass_kernels(mod, sm, profile_all=True, 
use_multiprocessing=False, t
             MM = arg0_shape[0]
             KK = arg0_shape[1]
             NN = arg1_shape[0]
-            out = cutlass_profiler.profile(
-                MM, NN, KK, annotator.signature["ret_dtype"], profile_all, 
use_multiprocessing
-            )
+            out_dtype = annotator.signature["ret_dtype"]
+            if any(isinstance(s, tvm.tir.Any) for s in [MM, KK, NN]):
+                out = cutlass_profiler.get_default(out_dtype)
+                print("Picked the default kernel " + out["name"])

Review comment:
       Replaced with `logging.info` but turning on the logging in the test 
script produces log from other modules:
   
   ```
   $ python test_cutlass.py 
   INFO:cutlass:Picked the default kernel 
cutlass_tensorop_h16816gemm_128x256_32x3_tn_align4
   INFO:te_compiler:Using injective.cpu for add based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for divide based on highest priority 
(10)
   INFO:te_compiler:Using reduce.cpu for prod based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for multiply based on highest priority 
(10)
   INFO:te_compiler:Using injective.cpu for cast based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for add based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for divide based on highest priority 
(10)
   INFO:te_compiler:Using reduce.cpu for prod based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for multiply based on highest priority 
(10)
   INFO:te_compiler:Using dense_cublas.cuda for nn.dense based on highest 
priority (25)
   ```
   
   Do you know how to log only `cutlass` ones?

##########
File path: python/tvm/contrib/cutlass/build.py
##########
@@ -105,9 +131,19 @@ def tune_cutlass_kernels(mod, sm, profile_all=True, 
use_multiprocessing=False, t
             MM = arg0_shape[0]
             KK = arg0_shape[1]
             NN = arg1_shape[0]
-            out = cutlass_profiler.profile(
-                MM, NN, KK, annotator.signature["ret_dtype"], profile_all, 
use_multiprocessing
-            )
+            out_dtype = annotator.signature["ret_dtype"]
+            if any(isinstance(s, tvm.tir.Any) for s in [MM, KK, NN]):
+                out = cutlass_profiler.get_default(out_dtype)
+                print("Picked the default kernel " + out["name"])

Review comment:
       Replaced with `logging.info` but turning on the logging in the test 
script produces log from other modules:
   
   ```
   $ python test_cutlass.py 
   INFO:cutlass:Picked the default kernel 
cutlass_tensorop_h16816gemm_128x256_32x3_tn_align4
   INFO:te_compiler:Using injective.cpu for add based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for divide based on highest priority 
(10)
   INFO:te_compiler:Using reduce.cpu for prod based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for multiply based on highest priority 
(10)
   INFO:te_compiler:Using injective.cpu for cast based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for add based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for divide based on highest priority 
(10)
   INFO:te_compiler:Using reduce.cpu for prod based on highest priority (10)
   INFO:te_compiler:Using injective.cpu for multiply based on highest priority 
(10)
   INFO:te_compiler:Using dense_cublas.cuda for nn.dense based on highest 
priority (25)
   ```
   
   Do you know how to log only `cutlass` ones? Did quick search but didn't find 
a correct way.

##########
File path: python/tvm/contrib/cutlass/build.py
##########
@@ -105,9 +131,19 @@ def tune_cutlass_kernels(mod, sm, profile_all=True, 
use_multiprocessing=False, t
             MM = arg0_shape[0]
             KK = arg0_shape[1]
             NN = arg1_shape[0]
-            out = cutlass_profiler.profile(
-                MM, NN, KK, annotator.signature["ret_dtype"], profile_all, 
use_multiprocessing
-            )
+            out_dtype = annotator.signature["ret_dtype"]
+            if any(isinstance(s, tvm.tir.Any) for s in [MM, KK, NN]):
+                out = cutlass_profiler.get_default(out_dtype)
+                print("Picked the default kernel " + out["name"])

Review comment:
       ok if this is not simple, I'll defer it for now.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to