[tvm-site] 01/01: Add TVM Unity blog post

jroesch Wed, 15 Dec 2021 19:16:32 -0800

This is an automated email from the ASF dual-hosted git repository.

jroesch pushed a commit to branch tvm-unity
in repository https://gitbox.apache.org/repos/asf/tvm-site.git


commit 6cfb138ab5383ada9e151003822f864cd2cae62e
Author: Jared Roesch <[email protected]>
AuthorDate: Wed Dec 15 19:15:47 2021 -0800

    Add TVM Unity blog post
---
 Gemfile                        |   2 +
 _posts/2021-12-16-tvm-unity.md | 112 +++++++++++++++++++++++++++++++++++++++++
 images/tvm-unity/image1.png    | Bin 0 -> 333125 bytes
 images/tvm-unity/image2.png    | Bin 0 -> 739514 bytes
 images/tvm-unity/image3.png    | Bin 0 -> 276661 bytes
 images/tvm-unity/image4.png    | Bin 0 -> 200252 bytes
 6 files changed, 114 insertions(+)

diff --git a/Gemfile b/Gemfile
index 85ec323..dea4159 100644
--- a/Gemfile
+++ b/Gemfile
@@ -28,3 +28,5 @@ end
 # Performance-booster for watching directories on Windows
 gem "wdm", "~> 0.1.1", :platforms => [:mingw, :x64_mingw, :mswin]
 
+
+gem "webrick", "~> 1.7"
diff --git a/_posts/2021-12-16-tvm-unity.md b/_posts/2021-12-16-tvm-unity.md
new file mode 100644
index 0000000..97ad0a3
--- /dev/null
+++ b/_posts/2021-12-16-tvm-unity.md
@@ -0,0 +1,112 @@
+---
+ layout: post
+ title: "Apache TVM Unity: a vision for the ML software & hardware ecosystem 
in 2022"
+ date: 2021-12-15
+ author: Adrian Sampson, Tianqi Chen, Jared Roesch
+---
+
+Apache TVM Unity is a roadmap for the TVM ecosystem in 2022. We see a broader 
shift coming in the way that machine learning system stacks optimize for 
flexibility and agility in the face of a rapidly changing hardware landscape. 
TVM will evolve to break down the boundaries that constrain the ways current ML 
systems adapt to rapid changes in ML models and the accelerators that implement 
them.
+
+## Boundaries in the Modern ML System Stack
+
+![image](/images/tvm-unity/image4.png){: style="width: 40%; margin: auto; 
display: block;" }
+
+The system stack for modern machine learning consists of four kinds of 
abstractions:
+1. The *computational graph* abstraction encodes the flow of data between 
coarse-grained tensor operators. Computational graphs are the high-level 
abstraction users interact with in [TensorFlow](https://www.tensorflow.org/), 
[MXNet](https://mxnet.apache.org/), and [PyTorch](https://pytorch.org/).
+2. *Tensor programs* implement the code for the operators in the computational 
graph. Deep learning compilers generate the low-level C++ or CUDA code for 
computations like convolutions or matrix multiplications.
+3. Similarly, *libraries and runtimes* include pre-written code to execute and 
orchestrate tensor operations. BLAS packages and libraries like cuDNN provide 
extensively tuned operator implementations for specific hardware targets.
+4. *Hardware primitives* are at the bottom of the stack. Here, low-level 
assembly languages and hardware accelerator interfaces expose the raw 
capabilities of the machine.
+
+There are *vertical* boundaries between the abstraction levels that prohibit 
cross-layer interactions and feedback between the levels. There is also a 
*horizontal* boundary between two opposing ways that software stacks can treat 
the central tensor computation level. The horizontal boundary divides 
*library-based* and *compilation-based* approaches to tensor computation.
+
+![image](/images/tvm-unity/image1.png){: style="width: 70%; margin: auto; 
display: block;" }
+
+Library-based frameworks rely on collections of pre-made, carefully tuned 
operator implementations as their computational workhorse. Compilation-based 
frameworks instead generate their own custom tensor operation code from 
scratch.  Modern software stacks typically use one style or the other, but they 
don’t combine them: most deep learning frameworks are library-based, while most 
deep learning compilers cannot incorporate libraries and runtimes.
+
+In the current landscape of ML systems, the boundaries between these layers 
tend to be strict. Neither approach is better than the other, but they have 
trade-offs. Library-based stacks excel on standard styles of ML models because 
they benefit from years of engineering investment common operators. On the 
other side, the flexibility and automation in compilation-based frameworks can 
be better for emerging models that require new operators.
+
+Vertical boundaries exist in both styles of software stack. AI applications 
start at the top of the stack and march through the layers from top to bottom. 
Frameworks choose data layout and operator fusion strategies at the graph 
level; then the tensor computations carry out the operators selected in the 
computational graph; and these operators map onto a fixed set of hardware 
primitives. It’s a one-shot, unidirectional workflow: performance constraints 
at the level of tensor programs, fo [...]
+
+Both vertical and horizontal boundaries are slowing down the pace of 
innovation in machine learning. New hardware accelerators are emerging with new 
levels of capability and performance, but harnessing them will require fluid 
collaboration between ML scientists, ML engineers, hardware vendors that these 
boundaries prevent. To cope with the rapid pace of change in ML systems, 
frameworks need to support **incremental** evolution: Incorporating new 
capabilities should require effort proport [...]
+
+## TVM Unity
+
+The TVM Unity vision is about breaking down these barriers. The goal is to 
enable cross-layer interactions and automate their optimization. It is not to 
collapse the abstraction layers into a monolith: there is no “silver bullet” 
representation for AI programs that simultaneously enables optimization at 
every level. Instead, TVM Unity will build interfaces for the abstractions to 
interact and exchange information.
+
+Removing the strict barriers between the levels in the system stack will 
enable new kinds of optimization that work jointly across the layers. A unified 
view of the entire system will let TVM automatically co-optimize decisions in 
the computation graph, the tensor operators, and the hardware mapping to search 
for the best possible implementation of an AI application. At the same time, 
TVM Unity will also serve as a communication substrate for interactions between 
ML scientists, ML engine [...]
+
+### Unifying Abstractions
+
+![image](/images/tvm-unity/image2.png){: style="width: 70%; margin: auto; 
display: block;" }
+
+TVM Unity will focus on letting AI applications fluidly cross the boundaries 
between operator graphs, tensor programs, and hardware primitives. In TVM, a 
single Python program can define a core tensor operation, incorporate a custom 
hardware primitive, and invoke the operation from a larger operator graph.
+This example shows all of these capabilities:
+
+```python
+import tvm.script
+from tvm.script import tir as T, relax as R
+
[email protected]_module
+class MyIRModule:
+    # Define a TIR based operation.
+       @T.prim_func
+       def tir_mm(X: T.Buffer[(n, d), "float32"],
+                   W: T.Buffer[(d, m), "float32"],
+                   Y: T.Buffer[(n, m), "float32"]):
+        for i, j, k  in T.grid(n, m, d):
+            with T.block("body"):
+                vi, vj, vk = T.axis.remap("SSR", [i, j, k])
+               with T.init():
+            Y[vi, vj] = 0
+        # Can be mapped on to HW intrinsics.
+        Y[vi, vj] += X[vi, vk] * W[vk, wj]
+
+       @R.function
+       def relax_func(x: R.Tensor[(n, d), "float32"], w: R.Tensor[(d, m), 
"float32"]):
+        with R.dataflow()
+            # Invoke the TIR code.
+            lv0: R.Tensor[(n, m), "float32"] = R.call_dps((n, m), tir_mm, [x, 
w])
+            lv1: R.Tensor[(n * m,), "float32"] = R.flatten(lv0)
+            gv0: R.Tensor[lv2, "float32"] = R.exp(lv1)
+            R.output(gv0)
+
+        # Invoke external update rule.
+        R.call_packed("custom_inplace_update", gv0)
+        return gv0
+```
+
+This code has both a tensor program (`tir_mm`) and computational graph that 
includes it (`relax_func`). The high-level data flow can directly invoke the 
low-level tensor manipulation to build up a larger computation. The TVM runtime 
unifies the operator graph and compiler-based tensor computation to optimize 
the entire program. This code also uses `call_packed` to invoke a pre-baked 
operator—showing how TVM can smoothly integrate library-based operators with 
the custom computation.
+
+Additionally, TensorIR opens doors to exploit hardware primitives through 
tensorization. Tensorization transforms loop-level programs to implementations 
that map onto the primitives that a particular hardware target declares.
+
+The key to highlight here is **cross layer interactions**. Our particular 
example shows interactions between: (1) computational graph and tensor 
programs; (2) computational graph and runtime libraries; (3) Finally tensor 
programs and hardware primitives through on-going automatic tensorization 
developments in TensorIR. These cross layer interactions open doors for making 
**incremental optimizations** at the boundary. For example, we can build a 
customized pass to the lower part of the su [...]
+
+In addition to the unification of abstraction layers, we are also working on 
unifying the shape representation, to enable **first class symbolic shape 
support** across the stack. In our particular example, the symbolic shape 
dimensions(n, m) can flow across the abstractions and enable advanced 
optimizations for dynamic workloads. The additional capabilities will open 
doors for both training and inference workload optimizations.
+
+### Unifying Perspectives
+
+Better ML systems require collaboration between ML scientists, ML engineers, 
and hardware engineers. The coming era of diverse specialized ML hardware will 
require coordinated effort from teams that include all three groups. By 
building rich, bidirectional interfaces between the layers in the system stack, 
TVM Unity aims to be the medium through which this collaboration and iteration 
happens.
+
+Abstractions in TVM can catalyze the lifecycle of an improvement to an AI 
application. At the highest level, an ML scientist can specify the operator 
they need to construct the next generation of a model. ML engineers can work at 
the tensor computation level to make this new operation efficient. Finally, 
these tensor computations can rely on hardware primitives written by hardware 
engineers. The work at each level will interact through Python APIs within the 
TVM ecosystem. The ability to [...]
+
+### Automation
+
+A unified ML system creates a new, larger search space than a system stack 
with strict boundaries. Decisions within tensor computations can influence the 
structure of the operator graph, and new hardware primitives can drastically 
change the optimal mappings at every other layer.
+
+TVM Unity will expose all these cross-layer interactions for automated 
optimization. Finding the best implementation for a given application will 
require learning-driven optimization: using ML to optimize ML by exploring the 
expanded joint search space and minimize the computational cost.
+
+In addition to that, we also want to leverage domain experts’ help when 
possible, and create mechanisms to effectively incorporate domain information 
to help guide the automatic optimizations.
+
+## New Capabilities with Unity
+
+The Unity vision guides the technical roadmap for TVM’s evolution over the 
next year. The unified approach will position TVM to offer new forms of 
automation and ecosystem integration that are not possible with today’s system 
stacks.
+
+With Unity, TVM will unify library-based computation with compiler-based 
automation. AI applications will be able to combine the world’s best known code 
for common operators with automatically optimized code for computations that 
don’t map neatly onto any existing operator. Developers will be able to 
smoothly transition between both strategies without a steep “performance cliff” 
when switching from built-in to generated code. Teams will be able to iterate 
rapidly with compiled code for n [...]
+
+TVM also aims to serve as a bridge to unify the broader ML and hardware 
ecosystems. In the ML ecosystem, TVM offers a minimal runtime that does not 
constrain teams’ choice of frameworks. TVM models will be easy to embed into 
other frameworks and runtimes as subgraphs for both training and inference. 
Through exchange formats like [ONNX](https://onnx.ai/) and 
[TorchScript](https://pytorch.org/docs/stable/jit.html), TVM models can fluidly 
integrate into larger applications built on any infr [...]
+
+![image](/images/tvm-unity/image3.png){: style="width: 50%; margin: auto; 
display: block;" }
+
+Beyond TVM alone, the same forces that are driving TVM Unity exist across the 
theory and practice of modern ML. Rapid changes to models, emerging alternative 
hardware, and aging abstraction boundaries all point toward the need for an 
integrated approach. We expect TVM to lead the way into the next great 
industry-wide shift in ML systems.
+
+For more details about our vision for TVM, check out [TVMCon 
2021](https://www.tvmcon.org) for more talks and discussion.
diff --git a/images/tvm-unity/image1.png b/images/tvm-unity/image1.png
new file mode 100644
index 0000000..616a144
Binary files /dev/null and b/images/tvm-unity/image1.png differ
diff --git a/images/tvm-unity/image2.png b/images/tvm-unity/image2.png
new file mode 100644
index 0000000..a23cd2e
Binary files /dev/null and b/images/tvm-unity/image2.png differ
diff --git a/images/tvm-unity/image3.png b/images/tvm-unity/image3.png
new file mode 100644
index 0000000..4a11da3
Binary files /dev/null and b/images/tvm-unity/image3.png differ
diff --git a/images/tvm-unity/image4.png b/images/tvm-unity/image4.png
new file mode 100644
index 0000000..d8d7657
Binary files /dev/null and b/images/tvm-unity/image4.png differ

[tvm-site] 01/01: Add TVM Unity blog post

Reply via email to