[GitHub] [tvm] vinx13 commented on a diff in pull request #14009: [MetaSchedule] Introduce Async Pipeline in MultiLevelTiling

via GitHub Wed, 22 Feb 2023 14:37:17 -0800


vinx13 commented on code in PR #14009:
URL: https://github.com/apache/tvm/pull/14009#discussion_r1115043317



##########
python/tvm/meta_schedule/schedule_rule/multi_level_tiling.py:
##########
@@ -59,6 +59,9 @@ class MultiLevelTiling(ScheduleRule):
     vector_load_lens : Optional[List[int]]
         The length of vector lane in vectorized cooperative fetching.
         None means disable vectorization
+    stages: Optional[List[int]]
+        All available stages for async pipeline. Stage should be in {4, 5}.

Review Comment:
   document what `4, 5` means here



##########
src/meta_schedule/schedule_rule/schedule_rule.cc:
##########
@@ -140,6 +142,46 @@ Array<ScheduleRule> ScheduleRule::DefaultCUDA() {
           /*tile_binds=*/Array<String>{"blockIdx.x", "vthread.x", 
"threadIdx.x"},
           /*max_innermost_factor=*/Integer(64),
           /*vector_load_lens=*/Array<Integer>{1, 2, 3, 4, 8, 16},
+          /*stages=*/Array<Integer>{4, 5},
+          /*reuse_read=*/
+          Map<String, ObjectRef>{{"req", String("must")},
+                                 {"levels", Array<Integer>{4}},  //
+                                 {"scope", String("shared")}},
+          /*reuse_write=*/
+          Map<String, ObjectRef>{{"req", String("must")},
+                                 {"levels", Array<Integer>{3}},  //
+                                 {"scope", String("local")}}),
+      ScheduleRule::InlineConstantScalars(),
+      ScheduleRule::AutoInline(
+          /*into_producer=*/true,
+          /*into_consumer=*/true,
+          /*inline_const_tensor=*/true,
+          /*disallow_if_then_else=*/false,
+          /*require_injective=*/false,
+          /*require_ordered=*/false,
+          /*disallow_op=*/Array<String>{}),
+      ScheduleRule::CrossThreadReduction(
+          /*thread_extents=*/Array<Integer>{4, 8, 16, 32, 64, 128, 256, 512}),
+      ScheduleRule::ParallelizeVectorizeUnroll(
+          /*max_jobs_per_core=*/-1,
+          /*max_vectorize_extent=*/-1,
+          /*unroll_max_steps=*/Array<Integer>{0, 16, 64, 512, 1024},
+          /*unroll_explicit=*/true),
+      ScheduleRule::AutoBind(
+          /*max_threadblocks=*/256,
+          /*thread_extents*/ Array<Integer>{32, 64, 128, 256, 512, 1024}),
+  };
+}
+
+Array<ScheduleRule> ScheduleRule::DefaultCUDAWithoutAsync() {

Review Comment:
   A few places need to be updated to dispatch to `DefaultCUDA` or 
`DefaultCUDANoAsync` based on arch:
   
https://github.com/apache/tvm/blob/cda8b2fc10138c8c3183daa459e1951da313fbb4/src/meta_schedule/schedule_rule/schedule_rule.cc#L281
   
https://github.com/apache/tvm/blob/main/src/meta_schedule/space_generator/space_generator.cc#L90
   
   Is it possible to unify both config into a single `DefaultCUDA` interface, 
where dispatching by arch is done inside?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] vinx13 commented on a diff in pull request #14009: [MetaSchedule] Introduce Async Pipeline in MultiLevelTiling

Reply via email to