[GitHub] [tvm-rfcs] areusch commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

GitBox Mon, 26 Jul 2021 17:35:24 -0700


areusch commented on a change in pull request #5:
URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r677029066




##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,604 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: https://github.com/apache/tvm-rfcs/pull/5/
+* GitHub Issue: https://github.com/apache/tvm/issues/8473
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies 
the
+approaches of AutoTVM [1] and AutoScheduler [2]. Meta schedule provides a 
pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR 
schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the 
automation system.
+
+Meta Schedule is the 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling.** In TVM,
+both TensorIR and TE allow direct or indirect IR transformation guided by a

Review comment:
       can we specify more about the IR transformation here? e.g. we need to 
answer the question: transform from what to what? otherwise you're confused 
unless you know TVM already.
   
   I think it's confusing to explain because the transformation can either be 
from TE -> TensorIR or TensorIR -> TensorIR.
   
   here's a stab: "TVM initially describes all model operators using an 
abstract description of the computation. Such abstractions can be described 
either in the Tensor Expression IR (the standard prior to Meta Scheduling) or 
in TensorIR (as a naïve computation). Through a process known as 
**scheduling**, TVM allows transformation of these IR to an imperative, 
optimized description of the implmented computation. Such transformation is 
guided by a developer-provided program..."

##########
File path: rfcs/0001-meta-schedule-autotensorir.md
##########
@@ -0,0 +1,604 @@
+<!--- Licensed to the Apache Software Foundation (ASF) under one -->
+<!--- or more contributor license agreements.  See the NOTICE file -->
+<!--- distributed with this work for additional information -->
+<!--- regarding copyright ownership.  The ASF licenses this file -->
+<!--- to you under the Apache License, Version 2.0 (the -->
+<!--- "License"); you may not use this file except in compliance -->
+<!--- with the License.  You may obtain a copy of the License at -->
+
+<!---   http://www.apache.org/licenses/LICENSE-2.0 -->
+
+<!--- Unless required by applicable law or agreed to in writing, -->
+<!--- software distributed under the License is distributed on an -->
+<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY -->
+<!--- KIND, either express or implied.  See the License for the -->
+<!--- specific language governing permissions and limitations -->
+<!--- under the License. -->
+
+* Feature Name: Meta Schedule (Formerly AutoTIR)
+* Start Date: 2021-05-28
+* RFC PR: https://github.com/apache/tvm-rfcs/pull/5/
+* GitHub Issue: https://github.com/apache/tvm/issues/8473
+
+## 1. Summary
+
+This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies 
the
+approaches of AutoTVM [1] and AutoScheduler [2]. Meta schedule provides a 
pragmatic way to define
+the space of automatic tuning, extensibility in terms of all possible TIR 
schedule primitives like
+tensorization and loop partitioning, and customizability on every layer of the 
automation system.
+
+Meta Schedule is the 3rd generation automatic scheduling system.
+
+## 2. Motivation
+
+**Scheduling.** In TVM,
+both TensorIR and TE allow direct or indirect IR transformation guided by a
+developer-provided program, for example,
+specifying a particular reordering of loops for better locality,
+or tensorizing a compute region with specific hardware intrinsics.
+The process of invoking such a set of pre-defined transformations is called 
"**scheduling**",
+and each of such transformations is called a "**schedule primitive**".
+These primitives form a domain-specific language (DSL).
+
+**Design space.** The set of all possible schedulings of a TE/TensorIR is 
called its design space.
+Optimization in TVM is essentially exploring such space to find out a 
scheduling that transforms the
+IR to generate the kernel with optimal performance.
+
+### Problems with the current scheduling system
+
+Currently there are 3 sets of scheduling APIs in TVM:
+* **Manual schedule**: Developers optimize their programs by manually invoking 
schedule primitives,
+  i.e. explore points in the design space with humans in the loop. This can be 
a tedious and
+  error-prone approach, hence the creation of AutoTVM and AutoScheduler.
+* **AutoTVM**: The automation system requires users to define the design space 
through
+  per-operator "schedule templates." Therefore, programmer time is a 
bottleneck in scaling
+  to hundreds of operators across many hardware platforms.
+  hardware platforms.
+* **AutoScheduler**: It automatically generates schedule templates as the 
design space,
+  according to a set of predefined "search rules". However, it is non-trivial 
to extend
+  AutoScheduler to new schedule primitives (tensorize, loop partition, 
software pipelining, etc).
+* The three systems above have isolated sets of APIs with several layers of 
their own abstraction,
+  which are not only hard to learn, but also engineering-intensive to 
customize.
+
+### Benefits of Meta Schedule
+
+The existing three scheduling systems are mutually incompatible with each 
other in terms of API
+design and divergence: besides manual TE scheduling, AutoTVM requires users to 
learn a new set of
+APIs, and AutoScheduler brings in another set of C++-based search rules. It 
adds the users' mental
+overhead to understand and extend the existing systems. Further, the inability 
to switch between
+template-based and template-free auto-tuning could lead to inferior 
customizability and hence
+make it needlessly difficult to achieve optimal performance.
+
+Meta schedule provides:
+* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of 
abstraction.
+* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and 
AutoScheduler-style
+  schedules.
+* Extensibility of all the schedule primitives, including tensorization and 
loop partitioning.
+  Almost no extra effort is needed to use a new primitive in auto-tuning.
+* The automation infrastructure is extensible on every of its components. 
Every component of the
+  system can be customized easily in pure python or C++ or both. For example, 
one can develop a new
+  design space generator in python, a new ProgramRunner in python, etc.
+
+
+## 3. Guide-level explanation
+
+Meta Schedule DSL is a language that provides TVM backend developers
+a flexible way to define or auto-generate the operator design space.
+
+This section introduces the syntax of Meta Schedule DSL by describing the 5 
common usage patterns
+envisioned by this RFC. These patterns are:
+1) Manually constructing a schedule using existing schedule primitives 
(Section 3.1);
+2) Defining composite schedule to simplify the ap sequence of schedule 
primitives (Section 3.2);
+3) Describing a design space of possible schedules,
+a.k.a. AutoTVM-style schedule templates (Section 3.3);
+4) Automatically generating the design space, a.k.a. AutoScheduler-style 
search rules (Section 3.4);
+5) Mixing the usage of manual schedule, AutoTVM and AutoScheduler-style design 
space specification
+in Meta Schedule (Section 3.5).
+
+### 3.1. Manual Schedule
+
+Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is 
an example of a manual
+schedule for matrix multiplication:
+
+```python
+# Designate a set of tile sizes
+i_tiles = [16, 8, 8, 8]
+j_tiles = [16, 8, 8, 8]
+k_tiles = [256, 8]
+
+# Tile the loops according to the tile sizes
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+In this manual scheduling example, the developers tweak the tile sizes and 
measure the performance of the
+generated kernels to explore the opportunities of potential optimization.
+
+Generally speaking, while writing a schedule, there are often some parameters 
that are hard to
+determine ahead of time, for example, tile sizes, unroll steps, or which 
tensor intrinsics to use.
+Developers may manually enumerate possible combinations of these unknown 
factors, and then pick the
+best schedule according to measurement results on their device.
+
+### 3.2. Composite Schedule Rules
+
+As introduced in the previous section, in TensorIR, each schedule primitive 
handles only a very
+basic transformation of the IR. For example, `split` only splits a loop into 
two new loops. In the
+real world, the over-fine granularity of those primitives usually leads to 
repetitive and verbose
+scheduling code. Take the code snippet in the previous section as an example: 
a sequence of `split`s
+are invoked, followed by a `reorder`. Taken together these 4 primitives are 
colloquially known as
+"SSRSRS" tiling.
+
+To make it more convenient and modular, users are allowed to register 
**composite schedules** that apply
+a sequence of schedule primitives according to certain analysis of the IR.
+The word **composite** here means the schedule transformation is *composed* of 
those **primitives**.
+
+For example, suppose there is a composite schedule called 
`Inline-Elementwise-Operation`, which
+inlines elementwise computation into their consumers if possible. Applying it 
to the
+following TensorIR:
+
+```python
[email protected]
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("B") ...:
+      B[i, j] = A[i, j] + 1
+  for i, j in ...:
+    with tir.Block("C") ...:
+      C[i, j] = B[i, j] + 1
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = C[i, j] + 1
+
+sch = tir.Schedule(example_func)
+# `InlineElementwiseOperation` is a composite schedule rule that analyzes a 
given block.
+# If the block contains only elementwise computation, and can be inlined into 
its consumer,
+# then `sch.compute_inline` is called on that block.
+inliner = InlineElementwiseOperation()
+inliner.apply(schedule=sch, block=sch.get_block("B"))
+inliner.apply(schedule=sch, block=sch.get_block("C"))
+inliner.apply(schedule=sch, block=sch.get_block("D"))
+```
+
+Below is the result after applying this composite schedule, and its 
corresponding trace:
+
+```python
+
+>>> print(tvm.script.asscript(sch.mod))
[email protected]
+def example_func(...):
+  for i, j in ...:
+    with tir.Block("D") ...:
+      D[i, j] = A[i, j] + 1 + 1 + 1
+
+>>> print(sch.trace)
+# Block "B" is elementwise and inlinable, then `sch.compute_inline(B)` is 
called
+B = sch.get_block("B")
+sch.compute_inline(B)
+# Block "C" is elementwise and inlinable, then `sch.compute_inline(C)` is 
called
+C = sch.get_block("C")
+sch.compute_inline(C)
+# Block "D" is elementwise but does not have a consumer,
+# so the rule does not call `compute_inline` because it is not inlinable
+D = sch.get_block("D")
+```
+
+### 3.3. AutoTVM-style Design Space Description
+
+Meta schedule extends the schedule DSL with a set of new schedule primitives
+called **sampling instructions**. These primitives do not transform the 
TensorIR,
+but instead introduce random statistical variables which can be referenced 
later in scheduling
+to parameterize the schedule. Incorporating **sampling instructions** into a 
operator's schedule
+allows the backend developers to succinctly describe a design space in terms of
+tiling strategies, fusion levels, unroll lengths, etc.
+
+The matmul example above is extended to cover all possible tilings using these 
sampling
+instructions:
+
+```python
+# Sample tile sizes
+i_tiles = sch.sample_perfect_tile(i, n=4)  # was: i_tiles = [16, 8, 8, 8]
+j_tiles = sch.sample_perfect_tile(j, n=4)  # was: j_tiles = [16, 8, 8, 8]
+k_tiles = sch.sample_perfect_tile(k, n=2)  # was: k_tiles = [256, 8]
+# Tile the loops according to the random variables
+i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles)
+j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles)
+k_0, k_1           = sch.split(loop=k, factors=k_tiles)
+# Organize the loops into "SSRSRS" 6-level tiles
+sch.reorder(
+    i_0, j_0, # S: the 1st spatial tile
+    i_1, j_1, # S: the 2nd spatial tile
+    k_0,      # R: the 1st reduction tile
+    i_2, j_2, # S: the 3rd spatial tile
+    k_1,      # R: the 2nd reduction tile
+    i_3, j_3, # S: the 4th spatial tile
+)
+```
+
+### 3.4. AutoScheduler-style Design Space Generation
+
+To generate design space, AutoScheduler applies a set of rules to each
+[TE stage](https://tvm.apache.org/docs/api/python/te.html#tvm.te.Stage) that 
corresponds to a
+[TE 
operation](https://tvm.apache.org/docs/api/doxygen/classtvm_1_1te_1_1Operation.html),
+defined by 
[`te.compute(...)`](https://tvm.apache.org/docs/api/python/te.html#tvm.te.compute).
+The rules analyze the TE operations and apply an [internal DSL]() to 
manipulating its internal IR,
+which is in the end mapped to TE schedule primitives. This process is called 
*sketch generation*.
+
+Composite schedule rules work in a similar way scheduling TensorIR, as 
introduced in Section 3.2.
+It analyzes the TensorIR and apply schedule primitives directly to TensorIR 
accordingly.
+When applying such rules to each TensorIR block in certain order (Post-DFS is 
provided as the
+builtin order, but customization is allowed),
+it generates a sequence of schedule primitives.
+This process corresponds to the *sketch generation* phase in AutoScheduler.
+If sampling instructions are present in this sequence,
+then a design space is defined by those instructions for the meta schedule to 
explore.
+This process is similar to the *random annotation* phase in AutoScheduler.
+
+Several built-in composite schedule rules are shipped with meta schedule to 
align with the design
+space generated by AutoScheduler:
+
+* Multi-level tiling
+* Inline pure spatial blocks
+* Parallelize & vectorize & unroll
+* Auto tensorize
+
+Developers may implement their own rules in either Python or C++. They may 
specify which rules to
+use with the following syntax:
+
+```python
+from tvm import meta_schedule as ms
+
+design_space_generator = ms.PostOrderApply(rules=[
+    ms.MultiLevelTiling(...),
+    CustomRule(...),
+    ms.OtherBuiltinRules(...),
+])
+
+```
+
+### 3.5. Unifying manual schedule / AutoTVM / AutoScheduler
+
+This subsection shows that the design space induced by TE manual schedule, 
AutoTVM and AutoScheduler
+are all subsets of meta schedule, and meta schedule further allows mixing 
those three styles to
+search jointly.
+
+- **Manual schedule**. The TE schedule is a special case of a meta schedule 
program, where there is no
+randomness introduced by sampling instructions. It is a single point in terms 
of design space.
+- **AutoTVM (Template-based tuning)**. It is more natural representation of 
AutoTVM’s schedule
+templates (knobs) by writing one or more schedule functions in meta schedule 
with sampling
+instructions. The probability space supported by the sampling instructions is 
the design space to
+be explored.
+- **AutoScheduler (Template-free tuning)**. As mentioned in the previous 
section, application
+  of composite schedule rules generates the design space, which is equivalent 
to AutoScheduler’s
+  sketch generation.
+- **Mixing styles in design space definition**. By taking union of the spaces 
induced by the three
+  special cases, meta schedule allows developers to combine generic rules that 
AutoScheduler
+  provides and operator-specific scheduling.
+
+## 4. Reference-level explanation
+
+This section introduces the underlying techniques for the automation system to 
extract and
+explore the design space. The figure below briefly illustrates the workflow of 
the system:
+
+![meta-schedule-workflow](../resources/meta-schedule-workflow.png)
+
+### 4.1. Execution trace as the design space
+
+**Trace**. To represent the design space defined by the meta schedule DSL, the 
underlying system
+records all the instructions users applied to the schedule class, including 
sampling and schedule
+primitives. This list of scheduling instructions being invoked, along with the 
random decisions made
+on sampling instructions, is called a trace.
+
+For instance, executing the example above results in the following trace:
+
+```
+Instruction 0. Sample-Perfect-Tile
+Instruction 1. Sample-Perfect-Tile
+Instruction 2. Sample-Perfect-Tile
+Instruction 3. Split
+Instruction 4. Split
+Instruction 5. Split
+Instruction 6. Reorder
+```
+
+**Trace forms design space.** A trace may contain zero or more sampling 
instructions, which
+introduces the uncertainty in scheduling - one instance of sampling results in 
one point in the
+design space. Therefore, the trace itself forms a design space to explore, 
e.g. which set of tile
+sizes works best on a specific hardware.
+
+**Union of design space**. Meta schedule works on a set of traces, 
representing the union of the design
+spaces represented by every single trace.
+
+**Fork a trace**. When two different decisions in the scheduling process are 
equally important to
+generate high-performance schedules, it is allowed to fork the trace into two, 
and the design space is
+the union of the forked traces.
+
+The trace is not strictly user-facing, but can be accessed and printed with 
the following syntax:

Review comment:
       if this isn't user-facing, how is the user supposed to invoke Meta 
Scheduler in a repeatable way? is there a serialization mechanism provided for 
the implementation (in particular `decisions` dict below)?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] areusch commented on a change in pull request #5: [RFC] Meta Schedule (AutoTIR)

Reply via email to