vinx13 commented on code in PR #14009:
URL: https://github.com/apache/tvm/pull/14009#discussion_r1115043317
##########
python/tvm/meta_schedule/schedule_rule/multi_level_tiling.py:
##########
@@ -59,6 +59,9 @@ class MultiLevelTiling(ScheduleRule):
vector_load_lens : Optional[List[int]]
The length of vector lane in vectorized cooperative fetching.
None means disable vectorization
+ stages: Optional[List[int]]
+ All available stages for async pipeline. Stage should be in {4, 5}.
Review Comment:
document what `4, 5` means here
##########
src/meta_schedule/schedule_rule/schedule_rule.cc:
##########
@@ -140,6 +142,46 @@ Array<ScheduleRule> ScheduleRule::DefaultCUDA() {
/*tile_binds=*/Array<String>{"blockIdx.x", "vthread.x",
"threadIdx.x"},
/*max_innermost_factor=*/Integer(64),
/*vector_load_lens=*/Array<Integer>{1, 2, 3, 4, 8, 16},
+ /*stages=*/Array<Integer>{4, 5},
+ /*reuse_read=*/
+ Map<String, ObjectRef>{{"req", String("must")},
+ {"levels", Array<Integer>{4}}, //
+ {"scope", String("shared")}},
+ /*reuse_write=*/
+ Map<String, ObjectRef>{{"req", String("must")},
+ {"levels", Array<Integer>{3}}, //
+ {"scope", String("local")}}),
+ ScheduleRule::InlineConstantScalars(),
+ ScheduleRule::AutoInline(
+ /*into_producer=*/true,
+ /*into_consumer=*/true,
+ /*inline_const_tensor=*/true,
+ /*disallow_if_then_else=*/false,
+ /*require_injective=*/false,
+ /*require_ordered=*/false,
+ /*disallow_op=*/Array<String>{}),
+ ScheduleRule::CrossThreadReduction(
+ /*thread_extents=*/Array<Integer>{4, 8, 16, 32, 64, 128, 256, 512}),
+ ScheduleRule::ParallelizeVectorizeUnroll(
+ /*max_jobs_per_core=*/-1,
+ /*max_vectorize_extent=*/-1,
+ /*unroll_max_steps=*/Array<Integer>{0, 16, 64, 512, 1024},
+ /*unroll_explicit=*/true),
+ ScheduleRule::AutoBind(
+ /*max_threadblocks=*/256,
+ /*thread_extents*/ Array<Integer>{32, 64, 128, 256, 512, 1024}),
+ };
+}
+
+Array<ScheduleRule> ScheduleRule::DefaultCUDAWithoutAsync() {
Review Comment:
A few places need to be updated to dispatch to `DefaultCUDA` or
`DefaultCUDANoAsync` based on arch:
https://github.com/apache/tvm/blob/cda8b2fc10138c8c3183daa459e1951da313fbb4/src/meta_schedule/schedule_rule/schedule_rule.cc#L281
https://github.com/apache/tvm/blob/main/src/meta_schedule/space_generator/space_generator.cc#L90
Is it possible to unify both config into a single `DefaultCUDA` interface,
where dispatching by arch is done inside?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]