areusch commented on a change in pull request #5: URL: https://github.com/apache/tvm-rfcs/pull/5#discussion_r677029066
########## File path: rfcs/0001-meta-schedule-autotensorir.md ########## @@ -0,0 +1,604 @@ +<!--- Licensed to the Apache Software Foundation (ASF) under one --> +<!--- or more contributor license agreements. See the NOTICE file --> +<!--- distributed with this work for additional information --> +<!--- regarding copyright ownership. The ASF licenses this file --> +<!--- to you under the Apache License, Version 2.0 (the --> +<!--- "License"); you may not use this file except in compliance --> +<!--- with the License. You may obtain a copy of the License at --> + +<!--- http://www.apache.org/licenses/LICENSE-2.0 --> + +<!--- Unless required by applicable law or agreed to in writing, --> +<!--- software distributed under the License is distributed on an --> +<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> +<!--- KIND, either express or implied. See the License for the --> +<!--- specific language governing permissions and limitations --> +<!--- under the License. --> + +* Feature Name: Meta Schedule (Formerly AutoTIR) +* Start Date: 2021-05-28 +* RFC PR: https://github.com/apache/tvm-rfcs/pull/5/ +* GitHub Issue: https://github.com/apache/tvm/issues/8473 + +## 1. Summary + +This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the +approaches of AutoTVM [1] and AutoScheduler [2]. Meta schedule provides a pragmatic way to define +the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like +tensorization and loop partitioning, and customizability on every layer of the automation system. + +Meta Schedule is the 3rd generation automatic scheduling system. + +## 2. Motivation + +**Scheduling.** In TVM, +both TensorIR and TE allow direct or indirect IR transformation guided by a Review comment: can we specify more about the IR transformation here? e.g. we need to answer the question: transform from what to what? otherwise you're confused unless you know TVM already. I think it's confusing to explain because the transformation can either be from TE -> TensorIR or TensorIR -> TensorIR. here's a stab: "TVM initially describes all model operators using an abstract description of the computation. Such abstractions can be described either in the Tensor Expression IR (the standard prior to Meta Scheduling) or in TensorIR (as a naïve computation). Through a process known as **scheduling**, TVM allows transformation of these IR to an imperative, optimized description of the implmented computation. Such transformation is guided by a developer-provided program..." ########## File path: rfcs/0001-meta-schedule-autotensorir.md ########## @@ -0,0 +1,604 @@ +<!--- Licensed to the Apache Software Foundation (ASF) under one --> +<!--- or more contributor license agreements. See the NOTICE file --> +<!--- distributed with this work for additional information --> +<!--- regarding copyright ownership. The ASF licenses this file --> +<!--- to you under the Apache License, Version 2.0 (the --> +<!--- "License"); you may not use this file except in compliance --> +<!--- with the License. You may obtain a copy of the License at --> + +<!--- http://www.apache.org/licenses/LICENSE-2.0 --> + +<!--- Unless required by applicable law or agreed to in writing, --> +<!--- software distributed under the License is distributed on an --> +<!--- "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY --> +<!--- KIND, either express or implied. See the License for the --> +<!--- specific language governing permissions and limitations --> +<!--- under the License. --> + +* Feature Name: Meta Schedule (Formerly AutoTIR) +* Start Date: 2021-05-28 +* RFC PR: https://github.com/apache/tvm-rfcs/pull/5/ +* GitHub Issue: https://github.com/apache/tvm/issues/8473 + +## 1. Summary + +This proposal introduces Meta Schedule: a scheduling DSL on TIR that unifies the +approaches of AutoTVM [1] and AutoScheduler [2]. Meta schedule provides a pragmatic way to define +the space of automatic tuning, extensibility in terms of all possible TIR schedule primitives like +tensorization and loop partitioning, and customizability on every layer of the automation system. + +Meta Schedule is the 3rd generation automatic scheduling system. + +## 2. Motivation + +**Scheduling.** In TVM, +both TensorIR and TE allow direct or indirect IR transformation guided by a +developer-provided program, for example, +specifying a particular reordering of loops for better locality, +or tensorizing a compute region with specific hardware intrinsics. +The process of invoking such a set of pre-defined transformations is called "**scheduling**", +and each of such transformations is called a "**schedule primitive**". +These primitives form a domain-specific language (DSL). + +**Design space.** The set of all possible schedulings of a TE/TensorIR is called its design space. +Optimization in TVM is essentially exploring such space to find out a scheduling that transforms the +IR to generate the kernel with optimal performance. + +### Problems with the current scheduling system + +Currently there are 3 sets of scheduling APIs in TVM: +* **Manual schedule**: Developers optimize their programs by manually invoking schedule primitives, + i.e. explore points in the design space with humans in the loop. This can be a tedious and + error-prone approach, hence the creation of AutoTVM and AutoScheduler. +* **AutoTVM**: The automation system requires users to define the design space through + per-operator "schedule templates." Therefore, programmer time is a bottleneck in scaling + to hundreds of operators across many hardware platforms. + hardware platforms. +* **AutoScheduler**: It automatically generates schedule templates as the design space, + according to a set of predefined "search rules". However, it is non-trivial to extend + AutoScheduler to new schedule primitives (tensorize, loop partition, software pipelining, etc). +* The three systems above have isolated sets of APIs with several layers of their own abstraction, + which are not only hard to learn, but also engineering-intensive to customize. + +### Benefits of Meta Schedule + +The existing three scheduling systems are mutually incompatible with each other in terms of API +design and divergence: besides manual TE scheduling, AutoTVM requires users to learn a new set of +APIs, and AutoScheduler brings in another set of C++-based search rules. It adds the users' mental +overhead to understand and extend the existing systems. Further, the inability to switch between +template-based and template-free auto-tuning could lead to inferior customizability and hence +make it needlessly difficult to achieve optimal performance. + +Meta schedule provides: +* Succinct syntax, consistent APIs to TensorIR schedule with no other layer of abstraction. +* Unified APIs for implementing manual schedules, AutoTVM-style schedules, and AutoScheduler-style + schedules. +* Extensibility of all the schedule primitives, including tensorization and loop partitioning. + Almost no extra effort is needed to use a new primitive in auto-tuning. +* The automation infrastructure is extensible on every of its components. Every component of the + system can be customized easily in pure python or C++ or both. For example, one can develop a new + design space generator in python, a new ProgramRunner in python, etc. + + +## 3. Guide-level explanation + +Meta Schedule DSL is a language that provides TVM backend developers +a flexible way to define or auto-generate the operator design space. + +This section introduces the syntax of Meta Schedule DSL by describing the 5 common usage patterns +envisioned by this RFC. These patterns are: +1) Manually constructing a schedule using existing schedule primitives (Section 3.1); +2) Defining composite schedule to simplify the ap sequence of schedule primitives (Section 3.2); +3) Describing a design space of possible schedules, +a.k.a. AutoTVM-style schedule templates (Section 3.3); +4) Automatically generating the design space, a.k.a. AutoScheduler-style search rules (Section 3.4); +5) Mixing the usage of manual schedule, AutoTVM and AutoScheduler-style design space specification +in Meta Schedule (Section 3.5). + +### 3.1. Manual Schedule + +Meta schedule APIs are almost the same as TE or TensorIR scheduling. Here is an example of a manual +schedule for matrix multiplication: + +```python +# Designate a set of tile sizes +i_tiles = [16, 8, 8, 8] +j_tiles = [16, 8, 8, 8] +k_tiles = [256, 8] + +# Tile the loops according to the tile sizes +i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles) +j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles) +k_0, k_1 = sch.split(loop=k, factors=k_tiles) + +# Organize the loops into "SSRSRS" 6-level tiles +sch.reorder( + i_0, j_0, # S: the 1st spatial tile + i_1, j_1, # S: the 2nd spatial tile + k_0, # R: the 1st reduction tile + i_2, j_2, # S: the 3rd spatial tile + k_1, # R: the 2nd reduction tile + i_3, j_3, # S: the 4th spatial tile +) +``` + +In this manual scheduling example, the developers tweak the tile sizes and measure the performance of the +generated kernels to explore the opportunities of potential optimization. + +Generally speaking, while writing a schedule, there are often some parameters that are hard to +determine ahead of time, for example, tile sizes, unroll steps, or which tensor intrinsics to use. +Developers may manually enumerate possible combinations of these unknown factors, and then pick the +best schedule according to measurement results on their device. + +### 3.2. Composite Schedule Rules + +As introduced in the previous section, in TensorIR, each schedule primitive handles only a very +basic transformation of the IR. For example, `split` only splits a loop into two new loops. In the +real world, the over-fine granularity of those primitives usually leads to repetitive and verbose +scheduling code. Take the code snippet in the previous section as an example: a sequence of `split`s +are invoked, followed by a `reorder`. Taken together these 4 primitives are colloquially known as +"SSRSRS" tiling. + +To make it more convenient and modular, users are allowed to register **composite schedules** that apply +a sequence of schedule primitives according to certain analysis of the IR. +The word **composite** here means the schedule transformation is *composed* of those **primitives**. + +For example, suppose there is a composite schedule called `Inline-Elementwise-Operation`, which +inlines elementwise computation into their consumers if possible. Applying it to the +following TensorIR: + +```python [email protected] +def example_func(...): + for i, j in ...: + with tir.Block("B") ...: + B[i, j] = A[i, j] + 1 + for i, j in ...: + with tir.Block("C") ...: + C[i, j] = B[i, j] + 1 + for i, j in ...: + with tir.Block("D") ...: + D[i, j] = C[i, j] + 1 + +sch = tir.Schedule(example_func) +# `InlineElementwiseOperation` is a composite schedule rule that analyzes a given block. +# If the block contains only elementwise computation, and can be inlined into its consumer, +# then `sch.compute_inline` is called on that block. +inliner = InlineElementwiseOperation() +inliner.apply(schedule=sch, block=sch.get_block("B")) +inliner.apply(schedule=sch, block=sch.get_block("C")) +inliner.apply(schedule=sch, block=sch.get_block("D")) +``` + +Below is the result after applying this composite schedule, and its corresponding trace: + +```python + +>>> print(tvm.script.asscript(sch.mod)) [email protected] +def example_func(...): + for i, j in ...: + with tir.Block("D") ...: + D[i, j] = A[i, j] + 1 + 1 + 1 + +>>> print(sch.trace) +# Block "B" is elementwise and inlinable, then `sch.compute_inline(B)` is called +B = sch.get_block("B") +sch.compute_inline(B) +# Block "C" is elementwise and inlinable, then `sch.compute_inline(C)` is called +C = sch.get_block("C") +sch.compute_inline(C) +# Block "D" is elementwise but does not have a consumer, +# so the rule does not call `compute_inline` because it is not inlinable +D = sch.get_block("D") +``` + +### 3.3. AutoTVM-style Design Space Description + +Meta schedule extends the schedule DSL with a set of new schedule primitives +called **sampling instructions**. These primitives do not transform the TensorIR, +but instead introduce random statistical variables which can be referenced later in scheduling +to parameterize the schedule. Incorporating **sampling instructions** into a operator's schedule +allows the backend developers to succinctly describe a design space in terms of +tiling strategies, fusion levels, unroll lengths, etc. + +The matmul example above is extended to cover all possible tilings using these sampling +instructions: + +```python +# Sample tile sizes +i_tiles = sch.sample_perfect_tile(i, n=4) # was: i_tiles = [16, 8, 8, 8] +j_tiles = sch.sample_perfect_tile(j, n=4) # was: j_tiles = [16, 8, 8, 8] +k_tiles = sch.sample_perfect_tile(k, n=2) # was: k_tiles = [256, 8] +# Tile the loops according to the random variables +i_0, i_1, i_2, i_3 = sch.split(loop=i, factors=i_tiles) +j_0, j_1, j_2, j_3 = sch.split(loop=j, factors=j_tiles) +k_0, k_1 = sch.split(loop=k, factors=k_tiles) +# Organize the loops into "SSRSRS" 6-level tiles +sch.reorder( + i_0, j_0, # S: the 1st spatial tile + i_1, j_1, # S: the 2nd spatial tile + k_0, # R: the 1st reduction tile + i_2, j_2, # S: the 3rd spatial tile + k_1, # R: the 2nd reduction tile + i_3, j_3, # S: the 4th spatial tile +) +``` + +### 3.4. AutoScheduler-style Design Space Generation + +To generate design space, AutoScheduler applies a set of rules to each +[TE stage](https://tvm.apache.org/docs/api/python/te.html#tvm.te.Stage) that corresponds to a +[TE operation](https://tvm.apache.org/docs/api/doxygen/classtvm_1_1te_1_1Operation.html), +defined by [`te.compute(...)`](https://tvm.apache.org/docs/api/python/te.html#tvm.te.compute). +The rules analyze the TE operations and apply an [internal DSL]() to manipulating its internal IR, +which is in the end mapped to TE schedule primitives. This process is called *sketch generation*. + +Composite schedule rules work in a similar way scheduling TensorIR, as introduced in Section 3.2. +It analyzes the TensorIR and apply schedule primitives directly to TensorIR accordingly. +When applying such rules to each TensorIR block in certain order (Post-DFS is provided as the +builtin order, but customization is allowed), +it generates a sequence of schedule primitives. +This process corresponds to the *sketch generation* phase in AutoScheduler. +If sampling instructions are present in this sequence, +then a design space is defined by those instructions for the meta schedule to explore. +This process is similar to the *random annotation* phase in AutoScheduler. + +Several built-in composite schedule rules are shipped with meta schedule to align with the design +space generated by AutoScheduler: + +* Multi-level tiling +* Inline pure spatial blocks +* Parallelize & vectorize & unroll +* Auto tensorize + +Developers may implement their own rules in either Python or C++. They may specify which rules to +use with the following syntax: + +```python +from tvm import meta_schedule as ms + +design_space_generator = ms.PostOrderApply(rules=[ + ms.MultiLevelTiling(...), + CustomRule(...), + ms.OtherBuiltinRules(...), +]) + +``` + +### 3.5. Unifying manual schedule / AutoTVM / AutoScheduler + +This subsection shows that the design space induced by TE manual schedule, AutoTVM and AutoScheduler +are all subsets of meta schedule, and meta schedule further allows mixing those three styles to +search jointly. + +- **Manual schedule**. The TE schedule is a special case of a meta schedule program, where there is no +randomness introduced by sampling instructions. It is a single point in terms of design space. +- **AutoTVM (Template-based tuning)**. It is more natural representation of AutoTVM’s schedule +templates (knobs) by writing one or more schedule functions in meta schedule with sampling +instructions. The probability space supported by the sampling instructions is the design space to +be explored. +- **AutoScheduler (Template-free tuning)**. As mentioned in the previous section, application + of composite schedule rules generates the design space, which is equivalent to AutoScheduler’s + sketch generation. +- **Mixing styles in design space definition**. By taking union of the spaces induced by the three + special cases, meta schedule allows developers to combine generic rules that AutoScheduler + provides and operator-specific scheduling. + +## 4. Reference-level explanation + +This section introduces the underlying techniques for the automation system to extract and +explore the design space. The figure below briefly illustrates the workflow of the system: + + + +### 4.1. Execution trace as the design space + +**Trace**. To represent the design space defined by the meta schedule DSL, the underlying system +records all the instructions users applied to the schedule class, including sampling and schedule +primitives. This list of scheduling instructions being invoked, along with the random decisions made +on sampling instructions, is called a trace. + +For instance, executing the example above results in the following trace: + +``` +Instruction 0. Sample-Perfect-Tile +Instruction 1. Sample-Perfect-Tile +Instruction 2. Sample-Perfect-Tile +Instruction 3. Split +Instruction 4. Split +Instruction 5. Split +Instruction 6. Reorder +``` + +**Trace forms design space.** A trace may contain zero or more sampling instructions, which +introduces the uncertainty in scheduling - one instance of sampling results in one point in the +design space. Therefore, the trace itself forms a design space to explore, e.g. which set of tile +sizes works best on a specific hardware. + +**Union of design space**. Meta schedule works on a set of traces, representing the union of the design +spaces represented by every single trace. + +**Fork a trace**. When two different decisions in the scheduling process are equally important to +generate high-performance schedules, it is allowed to fork the trace into two, and the design space is +the union of the forked traces. + +The trace is not strictly user-facing, but can be accessed and printed with the following syntax: Review comment: if this isn't user-facing, how is the user supposed to invoke Meta Scheduler in a repeatable way? is there a serialization mechanism provided for the implementation (in particular `decisions` dict below)? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
