[GitHub] [tvm] Hzfengsy commented on a diff in pull request #14673: [MetaSchedule] Introduce MMA Tensor Core Multilevel Tiling

via GitHub Sat, 24 Jun 2023 19:53:20 -0700


Hzfengsy commented on code in PR #14673:
URL: https://github.com/apache/tvm/pull/14673#discussion_r1241007045



##########
include/tvm/tir/transform.h:
##########
@@ -481,7 +481,7 @@ TVM_DLL Pass ConvertBlocksToOpaque();
  * Before narrowing, `B` is a `[16, 16]` buffer, but only a skinny vector 
`B[i, 0:16]` is accessed.
  *
  *  \code
- *
+ *∏

Review Comment:
   typo here



##########
src/driver/driver_api.cc:
##########
@@ -202,14 +202,18 @@ Array<tvm::transform::Pass> CreatePassList(bool 
disable_loop_partition) {
   pass_list.push_back(tir::transform::LowerInitBlock());
   pass_list.push_back(tir::transform::PlanAndUpdateBufferAllocationLocation());
   pass_list.push_back(tir::transform::ConvertBlocksToOpaque());
-  pass_list.push_back(tir::transform::UnifyThreadBinding());
   pass_list.push_back(tir::transform::ManifestSharedMemoryLocalStage());
   pass_list.push_back(tir::transform::CompactBufferAllocation());
   pass_list.push_back(tir::transform::LowerAutoCopy());
+  pass_list.push_back(tir::transform::UnifyThreadBinding());
   pass_list.push_back(tir::transform::LowerMatchBuffer());
+  pass_list.push_back(tir::transform::Simplify());
+  pass_list.push_back(tir::transform::InjectPermutedLayout());
+  pass_list.push_back(tir::transform::Simplify());
   pass_list.push_back(tir::transform::InjectSoftwarePipeline());
-  pass_list.push_back(tir::transform::LowerOpaqueBlock());
+  pass_list.push_back(tir::transform::TransformMmaBufferLayout());
   pass_list.push_back(tir::transform::FlattenBuffer());
+  pass_list.push_back(tir::transform::LowerOpaqueBlock());

Review Comment:
   Why `LowerOpaqueBlock` should be called after `FlattenBuffer`? I know it 
needs to be after the `TransformMmaBufferLayout`, but could it be before 
`FlattenBuffer`?



##########
python/tvm/tir/tensor_intrin/cuda.py:
##########
@@ -530,17 +530,17 @@ def mma_store_impl(a: T.handle, c: T.handle) -> None:
 
 MMA_store_16x16_f32_global_INTRIN = "mma_store_16x16_f32_global_"
 TensorIntrin.register(
-    MMA_store_16x16_f32_global_INTRIN, *get_mma_store_intrin("float32", 8, 
"global")
+    MMA_store_16x16_f32_global_INTRIN, *get_mma_stmatrix_intrin("float32", 8, 
"global")

Review Comment:
   Any reason to rename it?



##########
python/tvm/tir/schedule/schedule.py:
##########
@@ -402,6 +402,46 @@ def sample_perfect_tile(
             )
         )
 
+    @type_checked
+    def sample_partitioned_tile(
+        self,
+        loop: LoopRV,
+        n: int,
+        partition_pos: int = 0,
+        innerpart_factor: int = 1,
+        decision: Optional[List[int]] = None,
+    ) -> List[ExprRV]:
+        """Sample the factors to a partitioned tile for a specific loop
+
+        Parameters
+        ----------
+        loop : LoopRV
+            The loop to be tiled
+        n : int
+            The number of tiles to be sampled
+        partition_pos : int
+            The position to partition tiles to two parts
+        innerpart_factor : int
+            The factor of the second part
+        decision: Optional[List[int]]
+            The sampling decision, if any
+
+        Returns
+        -------
+        result : List[ExprRV]
+            A list of length `n`, the random partitioned tile sizes sampled
+        """
+        return list(

Review Comment:
   Not sure if explicit conversion is necessary



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] Hzfengsy commented on a diff in pull request #14673: [MetaSchedule] Introduce MMA Tensor Core Multilevel Tiling

Reply via email to