This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new 2330d86 Build at Tue Nov 3 09:02:03 EST 2020
2330d86 is described below
commit 2330d862e2d490be1c9e5633de8b550e14182c52
Author: tqchen <[email protected]>
AuthorDate: Tue Nov 3 09:02:03 2020 -0500
Build at Tue Nov 3 09:02:03 EST 2020
---
2017/08/17/tvm-release-announcement.html | 2 +-
...s-with-TVM-A-Depthwise-Convolution-Example.html | 8 +--
2017/10/06/nnvm-compiler-announcement.html | 2 +-
...s-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html | 4 +-
2017/11/08/android-rpc-introduction.html | 2 +-
2018/01/16/opt-mali-gpu.html | 2 +-
2018/03/12/webgl.html | 2 +-
2018/03/23/nmt-transformer-optimize.html | 2 +-
2018/07/12/vta-release-announcement.html | 12 ++--
2018/08/10/DLPack-Bridge.html | 4 +-
2018/10/03/auto-opt-all.html | 2 +-
2018/10/09/ml-in-tees.html | 2 +-
2018/12/18/lowprecision-conv.html | 6 +-
2019/01/19/Golang.html | 8 +--
2019/03/18/tvm-apache-announcement.html | 2 +-
2019/04/29/opt-cuda-quantized.html | 12 ++--
2019/05/30/pytorch-frontend.html | 2 +-
...machine-learning-to-webassembly-and-webgpu.html | 2 +-
2020/06/04/tinyml-how-tvm-is-taming-tiny.html | 2 +-
2020/07/14/bert-pytorch-tvm.html | 2 +-
.../15/how-to-bring-your-own-codegen-to-tvm.html | 2 +-
2020/09/26/bring-your-own-datatypes.html | 2 +-
atom.xml | 76 ++++++++++-----------
feed.xml | 40 +++++------
rss.xml | 78 +++++++++++-----------
25 files changed, 139 insertions(+), 139 deletions(-)
diff --git a/2017/08/17/tvm-release-announcement.html
b/2017/08/17/tvm-release-announcement.html
index e5ee2d1..9b83eb3 100644
--- a/2017/08/17/tvm-release-announcement.html
+++ b/2017/08/17/tvm-release-announcement.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>TVM: An End to End IR Stack for Deploying Deep Learning Workloads on
Hardware Platforms </h1>
<p class="post-meta">
- <time datetime="2017-08-17T12:00:00-07:00" itemprop="datePublished">
+ <time datetime="2017-08-17T15:00:00-04:00" itemprop="datePublished">
Aug 17, 2017
</time>
diff --git
a/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
b/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
index 5d0fa56..a03a6bf 100644
---
a/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
+++
b/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Optimize Deep Learning GPU Operators with TVM: A Depthwise
Convolution Example </h1>
<p class="post-meta">
- <time datetime="2017-08-22T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2017-08-22T00:00:00-04:00" itemprop="datePublished">
Aug 22, 2017
</time>
@@ -705,9 +705,9 @@ Below is the result with Input = [1, 256, 96, 96], Filter =
[256, 1, 3, 3], stri
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li>Declare: <a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py">https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py</a></li>
- <li>Schedule: <a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py">https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py</a></li>
- <li>Test: <a
href="https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py">https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py</a></li>
+ <li>Declare: <a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/depthwise_conv2d.py">https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/depthwise_conv2d.py</a></li>
+ <li>Schedule: <a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/depthwise_conv2d.py">https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/depthwise_conv2d.py</a></li>
+ <li>Test: <a
href="https://github.com/apache/incubator-tvm/blob/main/topi/recipe/conv/depthwise_conv2d_test.py">https://github.com/apache/incubator-tvm/blob/main/topi/recipe/conv/depthwise_conv2d_test.py</a></li>
</ul>
<h2 id="acknowledgements">Acknowledgements</h2>
diff --git a/2017/10/06/nnvm-compiler-announcement.html
b/2017/10/06/nnvm-compiler-announcement.html
index d7b9c05..d3eb49f 100644
--- a/2017/10/06/nnvm-compiler-announcement.html
+++ b/2017/10/06/nnvm-compiler-announcement.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>NNVM Compiler: Open Compiler for AI Frameworks </h1>
<p class="post-meta">
- <time datetime="2017-10-06T08:30:00-07:00" itemprop="datePublished">
+ <time datetime="2017-10-06T11:30:00-04:00" itemprop="datePublished">
Oct 6, 2017
</time>
diff --git
a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
index eb4caed..1b48741 100644
--- a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
+++ b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm </h1>
<p class="post-meta">
- <time datetime="2017-10-30T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2017-10-30T00:00:00-04:00" itemprop="datePublished">
Oct 30, 2017
</time>
@@ -204,7 +204,7 @@ TVM prediction top-1: 282 tiger cat</code></pre></figure>
<h2 id="a-note-on-performance">A Note on performance</h2>
-<p>The current support on ROCm focuses on the functionality coverage. We have
already seen promising performance results by simply adopting existing TVM
schedules for CUDA backend. For example, you can try running <a
href="https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py">the
gemm test script</a> in the TVM repository and see the result. For two types
of cards we tested, the current gemm recipe for square matrix multiplication
(not yet specifically optimized f [...]
+<p>The current support on ROCm focuses on the functionality coverage. We have
already seen promising performance results by simply adopting existing TVM
schedules for CUDA backend. For example, you can try running <a
href="https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py">the
gemm test script</a> in the TVM repository and see the result. For two types
of cards we tested, the current gemm recipe for square matrix multiplication
(not yet specifically o [...]
This is already a promising start, as it is very hard to optimize performance
to get to peak and we
did not yet apply AMD GPU specific optimizations.
We are starting to look at performance optimization and we expect more
improvement to come.</p>
diff --git a/2017/11/08/android-rpc-introduction.html
b/2017/11/08/android-rpc-introduction.html
index d354c0a..104829e 100644
--- a/2017/11/08/android-rpc-introduction.html
+++ b/2017/11/08/android-rpc-introduction.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Remote Profile and Test Deep Learning Cross Compilation on Mobile
Phones with TVM RPC </h1>
<p class="post-meta">
- <time datetime="2017-11-08T00:00:00-08:00" itemprop="datePublished">
+ <time datetime="2017-11-08T00:00:00-05:00" itemprop="datePublished">
Nov 8, 2017
</time>
diff --git a/2018/01/16/opt-mali-gpu.html b/2018/01/16/opt-mali-gpu.html
index 814ea6e..71d3d86 100644
--- a/2018/01/16/opt-mali-gpu.html
+++ b/2018/01/16/opt-mali-gpu.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Optimizing Mobile Deep Learning on ARM GPU with TVM </h1>
<p class="post-meta">
- <time datetime="2018-01-16T00:00:00-08:00" itemprop="datePublished">
+ <time datetime="2018-01-16T00:00:00-05:00" itemprop="datePublished">
Jan 16, 2018
</time>
diff --git a/2018/03/12/webgl.html b/2018/03/12/webgl.html
index 81e89ac..db05f52 100644
--- a/2018/03/12/webgl.html
+++ b/2018/03/12/webgl.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Compiling Deep Learning Models to WebGL with TVM </h1>
<p class="post-meta">
- <time datetime="2018-03-12T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2018-03-12T00:00:00-04:00" itemprop="datePublished">
Mar 12, 2018
</time>
diff --git a/2018/03/23/nmt-transformer-optimize.html
b/2018/03/23/nmt-transformer-optimize.html
index 7dd4172..9ec078f 100644
--- a/2018/03/23/nmt-transformer-optimize.html
+++ b/2018/03/23/nmt-transformer-optimize.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Bringing TVM into TensorFlow for Optimizing Neural Machine
Translation on GPU </h1>
<p class="post-meta">
- <time datetime="2018-03-23T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2018-03-23T00:00:00-04:00" itemprop="datePublished">
Mar 23, 2018
</time>
diff --git a/2018/07/12/vta-release-announcement.html
b/2018/07/12/vta-release-announcement.html
index a4b1dd0..7155faa 100644
--- a/2018/07/12/vta-release-announcement.html
+++ b/2018/07/12/vta-release-announcement.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>VTA: An Open, Customizable Deep Learning Acceleration Stack </h1>
<p class="post-meta">
- <time datetime="2018-07-12T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2018-07-12T00:00:00-04:00" itemprop="datePublished">
Jul 12, 2018
</time>
@@ -158,7 +158,7 @@
<p>VTA is more than a standalone accelerator design: it’s an end-to-end
solution that includes drivers, a JIT runtime, and an optimizing compiler stack
based on TVM. The current release includes a behavioral hardware simulator, as
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast
prototyping. By extending the TVM stack with a customizable, and open source
deep learning hardware accelerator design, we are exposing a transparent
end-to-end deep learning stack from [...]
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
<p>The VTA and TVM stack together constitute a blueprint for end-to-end,
accelerator-centric deep learning system that can:</p>
@@ -213,7 +213,7 @@ The extendability of the compiler stack, combined with the
ability to modify the
<p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning accelerator
built around a GEMM core, which performs dense matrix multiplication at a high
computational throughput.
The design is inspired by mainstream deep learning accelerators, of the likes
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide
memory access latency and maximize utilization of compute resources. To a
broader extent, VTA can serve as a template deep learning accelerator design,
exposing a clean tensor computation abstraction to the compiler stack.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
<p>The figure above presents a high-level overview of the VTA hardware
organization. VTA is composed of four modules that communicate between each
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to
allow for task-level pipeline parallelism.
The compute module performs both dense linear algebra computation with its
GEMM core, and general computation with its tensor ALU.
@@ -230,7 +230,7 @@ The first approach, which doesn’t require special hardware
is to run deep lear
This simulator back-end is readily available for developers to experiment with.
The second approach relies on an off-the-shelf and low-cost FPGA development
board – the <a href="http://www.pynq.io/">Pynq board</a>, which exposes a
reconfigurable FPGA fabric and an ARM SoC.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
<p>The VTA release offers a simple compilation and deployment flow of the VTA
hardware design and TVM workloads on the Pynq platform, with the help of an RPC
server interface.
The RPC server handles FPGA reconfiguration tasks and TVM module invocation
offloading onto the VTA runtime.
@@ -253,7 +253,7 @@ While this platform is meant for prototyping (the 2012 FPGA
cannot compete with
<p>A popular method used to assess the efficient use of hardware are roofline
diagrams: given a hardware design, how efficiently are different workloads
utilizing the hardware compute and memory resources. The roofline plot below
shows the throughput achieved on different convolution layers of the ResNet-18
inference benchmark. Each layer has a different arithmetic intensity, i.e.
compute to data movement ratio.
In the left half, convolution layers are bandwidth limited, whereas on the
right half, they are compute limited.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
<p>The goal behind designing a hardware architecture, and a compiler stack is
to bring each workload as close as possible to the roofline of the target
hardware.
The roofline plot shows the effects of having the hardware and compiler work
together to maximize utilization of the available hardware resources.
@@ -262,7 +262,7 @@ The result is an overall higher utilization of the
available compute and memory
<h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18 evaluation</h3>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
<p>A benefit of having a complete compiler stack built for VTA is the ability
to run end-to-end workloads. This is compelling in the context of hardware
acceleration because we need to understand what performance bottlenecks, and
Amdahl limitations stand in the way to obtaining faster performance.
The bar plot above shows inference performance with and without offloading the
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s
ARM Cortex A9 SoC.
diff --git a/2018/08/10/DLPack-Bridge.html b/2018/08/10/DLPack-Bridge.html
index b64eead..0ec196d 100644
--- a/2018/08/10/DLPack-Bridge.html
+++ b/2018/08/10/DLPack-Bridge.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Building a Cross-Framework Deep Learning Compiler via DLPack </h1>
<p class="post-meta">
- <time datetime="2018-08-10T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2018-08-10T00:00:00-04:00" itemprop="datePublished">
Aug 10, 2018
</time>
@@ -262,7 +262,7 @@ found <a
href="https://tvm.apache.org/docs//tutorials/optimize/opt_gemm.html">he
</code></pre></div></div>
<h2 id="under-the-hood-of-the-pytorch-example">Under the hood of the PyTorch
Example</h2>
-<p>As TVM provides <a
href="https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455">functions</a>
to convert dlpack tensors to tvm <code class="language-plaintext
highlighter-rouge">NDArray</code>s and
+<p>As TVM provides <a
href="https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h#L455">functions</a>
to convert dlpack tensors to tvm <code class="language-plaintext
highlighter-rouge">NDArray</code>s and
vice-versa, so all that is needed is some syntactic sugar by wrapping
functions.
<code class="language-plaintext highlighter-rouge">convert_func</code> is a
generic converter for frameworks using tensors with dlpack
support, and can be used to implement convenient converters, such as
diff --git a/2018/10/03/auto-opt-all.html b/2018/10/03/auto-opt-all.html
index 005b8fc..f5f1482 100644
--- a/2018/10/03/auto-opt-all.html
+++ b/2018/10/03/auto-opt-all.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Automatic Kernel Optimization for Deep Learning on All Hardware
Platforms </h1>
<p class="post-meta">
- <time datetime="2018-10-03T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2018-10-03T00:00:00-04:00" itemprop="datePublished">
Oct 3, 2018
</time>
diff --git a/2018/10/09/ml-in-tees.html b/2018/10/09/ml-in-tees.html
index 85f637d..3838be6 100644
--- a/2018/10/09/ml-in-tees.html
+++ b/2018/10/09/ml-in-tees.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Efficient Privacy-Preserving ML Using TVM </h1>
<p class="post-meta">
- <time datetime="2018-10-09T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2018-10-09T00:00:00-04:00" itemprop="datePublished">
Oct 9, 2018
</time>
diff --git a/2018/12/18/lowprecision-conv.html
b/2018/12/18/lowprecision-conv.html
index e1fafc9..31738a3 100644
--- a/2018/12/18/lowprecision-conv.html
+++ b/2018/12/18/lowprecision-conv.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Automating Generation of Low Precision Deep Learning Operators </h1>
<p class="post-meta">
- <time datetime="2018-12-18T00:00:00-08:00" itemprop="datePublished">
+ <time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">
Dec 18, 2018
</time>
@@ -292,8 +292,8 @@ Note: x86 doesn’t support a vectorized popcount for this
microarchitecture, so
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py">TOPI
bitserial convolution</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI
ARM cpu bitserial convolution</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py">TOPI
bitserial convolution</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI
ARM cpu bitserial convolution</a></li>
</ul>
<h2 id="references">References</h2>
diff --git a/2019/01/19/Golang.html b/2019/01/19/Golang.html
index da4cdbd..87e1e11 100644
--- a/2019/01/19/Golang.html
+++ b/2019/01/19/Golang.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>TVM Golang Runtime for Deep Learning Deployment </h1>
<p class="post-meta">
- <time datetime="2019-01-19T00:00:00-08:00" itemprop="datePublished">
+ <time datetime="2019-01-19T00:00:00-05:00" itemprop="datePublished">
Jan 19, 2019
</time>
@@ -293,14 +293,14 @@ For simplicity the error handling is ignored here, but is
important in real appl
</code></pre></div></div>
<p><code class="language-plaintext highlighter-rouge">gotvm</code> extends the
TVM packed function system to support golang function closures as packed
functions.
-<a href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a>
available to register golang
+<a
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a>
available to register golang
closure as TVM packed function and invoke the same across programming language
barriers.</p>
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li><a href="https://github.com/dmlc/tvm/blob/master/golang/src">Package
Source</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/golang/src">Package
Source</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a></li>
</ul>
<h2 id="references">References</h2>
diff --git a/2019/03/18/tvm-apache-announcement.html
b/2019/03/18/tvm-apache-announcement.html
index 012b8e3..0e06763 100644
--- a/2019/03/18/tvm-apache-announcement.html
+++ b/2019/03/18/tvm-apache-announcement.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>TVM Deep Learning Compiler Joins Apache Software Foundation </h1>
<p class="post-meta">
- <time datetime="2019-03-18T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2019-03-18T00:00:00-04:00" itemprop="datePublished">
Mar 18, 2019
</time>
diff --git a/2019/04/29/opt-cuda-quantized.html
b/2019/04/29/opt-cuda-quantized.html
index 8e24619..aebb4c7 100644
--- a/2019/04/29/opt-cuda-quantized.html
+++ b/2019/04/29/opt-cuda-quantized.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Automating Optimization of Quantized Deep Learning Models on CUDA
</h1>
<p class="post-meta">
- <time datetime="2019-04-29T09:00:00-07:00" itemprop="datePublished">
+ <time datetime="2019-04-29T12:00:00-04:00" itemprop="datePublished">
Apr 29, 2019
</time>
@@ -219,7 +219,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and
weight layout in OIHW4o4
</div>
<p></p>
-<p>After we have specified the layout of convolution layers, other operators
such as <code class="language-plaintext highlighter-rouge">add</code> and
activations can automatically adapt to the chosen layout during the <a
href="https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a>
pass in Relay.
+<p>After we have specified the layout of convolution layers, other operators
such as <code class="language-plaintext highlighter-rouge">add</code> and
activations can automatically adapt to the chosen layout during the <a
href="https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a>
pass in Relay.
The layout transformation of the weight can be precomputed offline. Therefore,
we can run the whole model in the same layout without extra overhead.</p>
<h2 id="designing-search-space-for-automatic-optimization">Designing Search
Space for Automatic Optimization</h2>
@@ -280,10 +280,10 @@ We show that automatic optimization in TVM makes it easy
and flexible to support
<h1 id="show-me-the-code">Show Me the Code</h1>
<ul>
<li><a
href="https://github.com/vinx13/tvm-cuda-int8-benchmark">Benchmark</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py">CUDA
int8 conv2d</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA
int8 group conv2d</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py">CUDA
int8 dense</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py">Tensor
intrinsics declaration</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py">CUDA
int8 conv2d</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA
int8 group conv2d</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py">CUDA
int8 dense</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py">Tensor
intrinsics declaration</a></li>
</ul>
<h1 id="bio--acknowledgement">Bio & Acknowledgement</h1>
diff --git a/2019/05/30/pytorch-frontend.html b/2019/05/30/pytorch-frontend.html
index fc95ffa..4f1ba30 100644
--- a/2019/05/30/pytorch-frontend.html
+++ b/2019/05/30/pytorch-frontend.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Integrating TVM into PyTorch </h1>
<p class="post-meta">
- <time datetime="2019-05-30T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2019-05-30T00:00:00-04:00" itemprop="datePublished">
May 30, 2019
</time>
diff --git
a/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
b/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
index 0b08a29..d4e4ec8 100644
--- a/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
+++ b/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Compiling Machine Learning to WASM and WebGPU with Apache TVM </h1>
<p class="post-meta">
- <time datetime="2020-05-14T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2020-05-14T00:00:00-04:00" itemprop="datePublished">
May 14, 2020
</time>
diff --git a/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
b/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
index 586ccf4..08c5e67 100644
--- a/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
+++ b/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>TinyML - How TVM is Taming Tiny </h1>
<p class="post-meta">
- <time datetime="2020-06-04T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2020-06-04T00:00:00-04:00" itemprop="datePublished">
Jun 4, 2020
</time>
diff --git a/2020/07/14/bert-pytorch-tvm.html b/2020/07/14/bert-pytorch-tvm.html
index 2b63cf0..43cc791 100644
--- a/2020/07/14/bert-pytorch-tvm.html
+++ b/2020/07/14/bert-pytorch-tvm.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Bridging PyTorch and TVM </h1>
<p class="post-meta">
- <time datetime="2020-07-14T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2020-07-14T00:00:00-04:00" itemprop="datePublished">
Jul 14, 2020
</time>
diff --git a/2020/07/15/how-to-bring-your-own-codegen-to-tvm.html
b/2020/07/15/how-to-bring-your-own-codegen-to-tvm.html
index c92704c..155ea18 100644
--- a/2020/07/15/how-to-bring-your-own-codegen-to-tvm.html
+++ b/2020/07/15/how-to-bring-your-own-codegen-to-tvm.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>How to Bring Your Own Codegen to TVM </h1>
<p class="post-meta">
- <time datetime="2020-07-15T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2020-07-15T00:00:00-04:00" itemprop="datePublished">
Jul 15, 2020
</time>
diff --git a/2020/09/26/bring-your-own-datatypes.html
b/2020/09/26/bring-your-own-datatypes.html
index 22bcf89..0486f82 100644
--- a/2020/09/26/bring-your-own-datatypes.html
+++ b/2020/09/26/bring-your-own-datatypes.html
@@ -140,7 +140,7 @@
<div class="span14 w-100">
<h1>Bring Your Own Datatypes: Enabling Custom Datatype Exploration in
TVM </h1>
<p class="post-meta">
- <time datetime="2020-09-26T00:00:00-07:00" itemprop="datePublished">
+ <time datetime="2020-09-26T00:00:00-04:00" itemprop="datePublished">
Sep 26, 2020
</time>
diff --git a/atom.xml b/atom.xml
index a68c2a3..3b07147 100644
--- a/atom.xml
+++ b/atom.xml
@@ -4,7 +4,7 @@
<title>TVM</title>
<link href="https://tvm.apache.org" rel="self"/>
<link href="https://tvm.apache.org"/>
- <updated>2020-11-02T16:31:02-08:00</updated>
+ <updated>2020-11-03T09:01:59-05:00</updated>
<id>https://tvm.apache.org</id>
<author>
<name></name>
@@ -15,7 +15,7 @@
<entry>
<title>Bring Your Own Datatypes: Enabling Custom Datatype Exploration in
TVM</title>
<link href="https://tvm.apache.org/2020/09/26/bring-your-own-datatypes"/>
- <updated>2020-09-26T00:00:00-07:00</updated>
+ <updated>2020-09-26T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2020/09/26/bring-your-own-datatypes</id>
<content type="html"><p>In this post, we describe the Bring Your Own
Datatypes framework, which enables the use of custom datatypes within
TVM.</p>
@@ -308,7 +308,7 @@ For more documentation about the Bring Your Own Datatypes
framework
<entry>
<title>How to Bring Your Own Codegen to TVM</title>
<link
href="https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm"/>
- <updated>2020-07-15T00:00:00-07:00</updated>
+ <updated>2020-07-15T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm</id>
<content type="html"><p>To free data scientists from worrying about
the performance when developing a new model, hardware backend providers (e.g.,
Intel, NVIDIA, ARM, etc) either provide kernel libraries such as cuBLAS or
cuDNN with many commonly used deep learning kernels, or provide frameworks such
as DNNL or TensorRT with a graph engine to let users describe their models in a
certain way to achieve high performance. In addition, emerging deep learning
accelerators also have t [...]
@@ -787,7 +787,7 @@ Figure 4: After Graph Partitioning.
<entry>
<title>Bridging PyTorch and TVM</title>
<link href="https://tvm.apache.org/2020/07/14/bert-pytorch-tvm"/>
- <updated>2020-07-14T00:00:00-07:00</updated>
+ <updated>2020-07-14T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2020/07/14/bert-pytorch-tvm</id>
<content type="html">
<p>(A more code-heavy variant is crossposted on the more PyTorch affine
<a
href="https://lernapparat.de/transformers-pytorch-tvm/">Lernapparat</a>,
@@ -1310,7 +1310,7 @@ He is a PyTorch core developer and co-authored <a
href="https://www.mann
<entry>
<title>TinyML - How TVM is Taming Tiny</title>
<link
href="https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny"/>
- <updated>2020-06-04T00:00:00-07:00</updated>
+ <updated>2020-06-04T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny</id>
<content type="html">
<p><img src="/images/microtvm/logo.png" alt="microTVM
logo" width="30%" /><br /></p>
@@ -1619,7 +1619,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix
multiplication microkernel</
<entry>
<title>Compiling Machine Learning to WASM and WebGPU with Apache TVM</title>
<link
href="https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu"/>
- <updated>2020-05-14T00:00:00-07:00</updated>
+ <updated>2020-05-14T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</id>
<content type="html"><p><strong>TLDR</strong></p>
@@ -1706,7 +1706,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix
multiplication microkernel</
<entry>
<title>Integrating TVM into PyTorch</title>
<link href="https://tvm.apache.org/2019/05/30/pytorch-frontend"/>
- <updated>2019-05-30T00:00:00-07:00</updated>
+ <updated>2019-05-30T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2019/05/30/pytorch-frontend</id>
<content type="html"><p>As TVM continuously demonstrates improvements
to the efficiency of deep learning execution,
it has become clear that PyTorch stands to benefit from directly leveraging
the compiler stack.
@@ -1808,7 +1808,7 @@ relay_graph = torch_tvm.to_relay(mul, inputs)
<entry>
<title>Automating Optimization of Quantized Deep Learning Models on
CUDA</title>
<link href="https://tvm.apache.org/2019/04/29/opt-cuda-quantized"/>
- <updated>2019-04-29T09:00:00-07:00</updated>
+ <updated>2019-04-29T12:00:00-04:00</updated>
<id>https://tvm.apache.org/2019/04/29/opt-cuda-quantized</id>
<content type="html"><p>Deep learning has been successfully applied
to a variety of tasks.
On real-time scenarios such as inference on autonomous vehicles, the inference
speed of the model is critical.
@@ -1877,7 +1877,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and
weight layout in OIHW4o4
</div>
<p></p>
-<p>After we have specified the layout of convolution layers, other
operators such as <code class="language-plaintext
highlighter-rouge">add</code> and activations can automatically
adapt to the chosen layout during the <a
href="https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a>
pass in Relay.
+<p>After we have specified the layout of convolution layers, other
operators such as <code class="language-plaintext
highlighter-rouge">add</code> and activations can automatically
adapt to the chosen layout during the <a
href="https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a>
pass in Relay.
The layout transformation of the weight can be precomputed offline. Therefore,
we can run the whole model in the same layout without extra overhead.</p>
<h2
id="designing-search-space-for-automatic-optimization">Designing
Search Space for Automatic Optimization</h2>
@@ -1938,10 +1938,10 @@ We show that automatic optimization in TVM makes it
easy and flexible to support
<h1 id="show-me-the-code">Show Me the Code</h1>
<ul>
<li><a
href="https://github.com/vinx13/tvm-cuda-int8-benchmark">Benchmark</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py">CUDA
int8 conv2d</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA
int8 group conv2d</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py">CUDA
int8 dense</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py">Tensor
intrinsics declaration</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py">CUDA
int8 conv2d</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA
int8 group conv2d</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py">CUDA
int8 dense</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py">Tensor
intrinsics declaration</a></li>
</ul>
<h1 id="bio--acknowledgement">Bio &amp;
Acknowledgement</h1>
@@ -1952,7 +1952,7 @@ We show that automatic optimization in TVM makes it easy
and flexible to support
<entry>
<title>TVM Deep Learning Compiler Joins Apache Software Foundation</title>
<link href="https://tvm.apache.org/2019/03/18/tvm-apache-announcement"/>
- <updated>2019-03-18T00:00:00-07:00</updated>
+ <updated>2019-03-18T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2019/03/18/tvm-apache-announcement</id>
<content type="html"><p>There is an increasing need to bring machine
learning to a wide diversity of hardware devices. Current frameworks rely on
vendor-specific operator libraries and optimize for a narrow range of
server-class GPUs. Deploying workloads to new platforms – such as mobile
phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) – requires
significant manual effort.</p>
@@ -1975,7 +1975,7 @@ We show that automatic optimization in TVM makes it easy
and flexible to support
<entry>
<title>TVM Golang Runtime for Deep Learning Deployment</title>
<link href="https://tvm.apache.org/2019/01/19/Golang"/>
- <updated>2019-01-19T00:00:00-08:00</updated>
+ <updated>2019-01-19T00:00:00-05:00</updated>
<id>https://tvm.apache.org/2019/01/19/Golang</id>
<content type="html"><h2
id="introduction">Introduction</h2>
@@ -2118,14 +2118,14 @@ For simplicity the error handling is ignored here, but
is important in real appl
</code></pre></div></div>
<p><code class="language-plaintext
highlighter-rouge">gotvm</code> extends the TVM packed function
system to support golang function closures as packed functions.
-<a
href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a>
available to register golang
+<a
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a>
available to register golang
closure as TVM packed function and invoke the same across programming language
barriers.</p>
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/golang/src">Package
Source</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/golang/src">Package
Source</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a></li>
</ul>
<h2 id="references">References</h2>
@@ -2145,7 +2145,7 @@ closure as TVM packed function and invoke the same across
programming language b
<entry>
<title>Automating Generation of Low Precision Deep Learning
Operators</title>
<link href="https://tvm.apache.org/2018/12/18/lowprecision-conv"/>
- <updated>2018-12-18T00:00:00-08:00</updated>
+ <updated>2018-12-18T00:00:00-05:00</updated>
<id>https://tvm.apache.org/2018/12/18/lowprecision-conv</id>
<content type="html"><p>As deep learning models grow larger and more
complex, deploying them on low powered phone and IoT
devices becomes challenging because of their limited compute and energy
budgets. A recent trend
@@ -2287,8 +2287,8 @@ Note: x86 doesn’t support a vectorized popcount for this
microarchitecture, so
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py">TOPI
bitserial convolution</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI
ARM cpu bitserial convolution</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py">TOPI
bitserial convolution</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI
ARM cpu bitserial convolution</a></li>
</ul>
<h2 id="references">References</h2>
@@ -2306,7 +2306,7 @@ Note: x86 doesn’t support a vectorized popcount for this
microarchitecture, so
<entry>
<title>Efficient Privacy-Preserving ML Using TVM</title>
<link href="https://tvm.apache.org/2018/10/09/ml-in-tees"/>
- <updated>2018-10-09T00:00:00-07:00</updated>
+ <updated>2018-10-09T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2018/10/09/ml-in-tees</id>
<content type="html"><p>This post describes Myelin, a framework for
privacy-preserving machine learning in trusted hardware enclaves, and how TVM
makes Myelin fast.
The key idea is that TVM, unlike other popular ML frameworks, compiles models
into lightweight, optimized, and dependency-free libraries which can fit into
resource constrained enclaves.</p>
@@ -2422,7 +2422,7 @@ His research interest is in the general domain of ML on
shared private data, but
<entry>
<title>Automatic Kernel Optimization for Deep Learning on All Hardware
Platforms</title>
<link href="https://tvm.apache.org/2018/10/03/auto-opt-all"/>
- <updated>2018-10-03T00:00:00-07:00</updated>
+ <updated>2018-10-03T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2018/10/03/auto-opt-all</id>
<content type="html"><p>Optimizing the performance of deep neural
network on a diverse range of hardware platforms is still a hard
problem for AI developers. In terms of system support, we are facing a
many-to-many problem here:
@@ -2816,7 +2816,7 @@ for inference deployment. TVM just provides such a
solution.</p>
<entry>
<title>Building a Cross-Framework Deep Learning Compiler via DLPack</title>
<link href="https://tvm.apache.org/2018/08/10/DLPack-Bridge"/>
- <updated>2018-08-10T00:00:00-07:00</updated>
+ <updated>2018-08-10T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2018/08/10/DLPack-Bridge</id>
<content type="html"><p>Deep learning frameworks such as Tensorflow,
PyTorch, and ApacheMxNet provide a
powerful toolbox for quickly prototyping and deploying deep learning models.
@@ -2928,7 +2928,7 @@ found <a
href="https://tvm.apache.org/docs//tutorials/optimize/opt_gemm.
</code></pre></div></div>
<h2 id="under-the-hood-of-the-pytorch-example">Under the hood
of the PyTorch Example</h2>
-<p>As TVM provides <a
href="https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455">functions</a>
to convert dlpack tensors to tvm <code class="language-plaintext
highlighter-rouge">NDArray</code>s and
+<p>As TVM provides <a
href="https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h#L455">functions</a>
to convert dlpack tensors to tvm <code class="language-plaintext
highlighter-rouge">NDArray</code>s and
vice-versa, so all that is needed is some syntactic sugar by wrapping
functions.
<code class="language-plaintext
highlighter-rouge">convert_func</code> is a generic converter for
frameworks using tensors with dlpack
support, and can be used to implement convenient converters, such as
@@ -2955,7 +2955,7 @@ support, and can be used to implement convenient
converters, such as
<entry>
<title>VTA: An Open, Customizable Deep Learning Acceleration Stack </title>
<link href="https://tvm.apache.org/2018/07/12/vta-release-announcement"/>
- <updated>2018-07-12T00:00:00-07:00</updated>
+ <updated>2018-07-12T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2018/07/12/vta-release-announcement</id>
<content type="html"><p style="text-align: center">Thierry
Moreau(VTA architect), Tianqi Chen(TVM stack), Ziheng Jiang†(graph
compilation), Luis Vega(cloud deployment)</p>
<p style="text-align: center">Advisors: Luis Ceze, Carlos
Guestrin, Arvind Krishnamurthy</p>
@@ -2967,7 +2967,7 @@ support, and can be used to implement convenient
converters, such as
<p>VTA is more than a standalone accelerator design: it’s an end-to-end
solution that includes drivers, a JIT runtime, and an optimizing compiler stack
based on TVM. The current release includes a behavioral hardware simulator, as
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast
prototyping. By extending the TVM stack with a customizable, and open source
deep learning hardware accelerator design, we are exposing a transparent
end-to-end deep learning stac [...]
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
<p>The VTA and TVM stack together constitute a blueprint for end-to-end,
accelerator-centric deep learning system that can:</p>
@@ -3022,7 +3022,7 @@ The extendability of the compiler stack, combined with
the ability to modify the
<p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning
accelerator built around a GEMM core, which performs dense matrix
multiplication at a high computational throughput.
The design is inspired by mainstream deep learning accelerators, of the likes
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide
memory access latency and maximize utilization of compute resources. To a
broader extent, VTA can serve as a template deep learning accelerator design,
exposing a clean tensor computation abstraction to the compiler stack.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
<p>The figure above presents a high-level overview of the VTA hardware
organization. VTA is composed of four modules that communicate between each
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to
allow for task-level pipeline parallelism.
The compute module performs both dense linear algebra computation with its
GEMM core, and general computation with its tensor ALU.
@@ -3039,7 +3039,7 @@ The first approach, which doesn’t require special
hardware is to run deep lear
This simulator back-end is readily available for developers to experiment with.
The second approach relies on an off-the-shelf and low-cost FPGA development
board – the <a href="http://www.pynq.io/">Pynq board</a>,
which exposes a reconfigurable FPGA fabric and an ARM SoC.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
<p>The VTA release offers a simple compilation and deployment flow of
the VTA hardware design and TVM workloads on the Pynq platform, with the help
of an RPC server interface.
The RPC server handles FPGA reconfiguration tasks and TVM module invocation
offloading onto the VTA runtime.
@@ -3062,7 +3062,7 @@ While this platform is meant for prototyping (the 2012
FPGA cannot compete with
<p>A popular method used to assess the efficient use of hardware are
roofline diagrams: given a hardware design, how efficiently are different
workloads utilizing the hardware compute and memory resources. The roofline
plot below shows the throughput achieved on different convolution layers of the
ResNet-18 inference benchmark. Each layer has a different arithmetic intensity,
i.e. compute to data movement ratio.
In the left half, convolution layers are bandwidth limited, whereas on the
right half, they are compute limited.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
<p>The goal behind designing a hardware architecture, and a compiler
stack is to bring each workload as close as possible to the roofline of the
target hardware.
The roofline plot shows the effects of having the hardware and compiler work
together to maximize utilization of the available hardware resources.
@@ -3071,7 +3071,7 @@ The result is an overall higher utilization of the
available compute and memory
<h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18
evaluation</h3>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
<p>A benefit of having a complete compiler stack built for VTA is the
ability to run end-to-end workloads. This is compelling in the context of
hardware acceleration because we need to understand what performance
bottlenecks, and Amdahl limitations stand in the way to obtaining faster
performance.
The bar plot above shows inference performance with and without offloading the
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s
ARM Cortex A9 SoC.
@@ -3097,7 +3097,7 @@ This kind of high-level visibility is essential to system
designers who want to
<entry>
<title>Bringing TVM into TensorFlow for Optimizing Neural Machine
Translation on GPU</title>
<link href="https://tvm.apache.org/2018/03/23/nmt-transformer-optimize"/>
- <updated>2018-03-23T00:00:00-07:00</updated>
+ <updated>2018-03-23T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2018/03/23/nmt-transformer-optimize</id>
<content type="html"><h2 id="author">Author</h2>
@@ -3363,7 +3363,7 @@ C = tvm.compute(
<entry>
<title>Compiling Deep Learning Models to WebGL with TVM</title>
<link href="https://tvm.apache.org/2018/03/12/webgl"/>
- <updated>2018-03-12T00:00:00-07:00</updated>
+ <updated>2018-03-12T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2018/03/12/webgl</id>
<content type="html"><p>Now TVM comes with a brand-new OpenGL/WebGL
backend!
This blog post explains what it is, and what you can achieve with it.</p>
@@ -3479,7 +3479,7 @@ optimizations into the TVM stack.</p>
<entry>
<title>Optimizing Mobile Deep Learning on ARM GPU with TVM</title>
<link href="https://tvm.apache.org/2018/01/16/opt-mali-gpu"/>
- <updated>2018-01-16T00:00:00-08:00</updated>
+ <updated>2018-01-16T00:00:00-05:00</updated>
<id>https://tvm.apache.org/2018/01/16/opt-mali-gpu</id>
<content type="html"><p>With the great success of deep learning, the
demand for
deploying deep neural networks to mobile devices is growing rapidly.
@@ -4053,7 +4053,7 @@ advice and <a
href="https://github.com/yzhliu">Yizhi Liu</a&g
<entry>
<title>Remote Profile and Test Deep Learning Cross Compilation on Mobile
Phones with TVM RPC</title>
<link href="https://tvm.apache.org/2017/11/08/android-rpc-introduction"/>
- <updated>2017-11-08T00:00:00-08:00</updated>
+ <updated>2017-11-08T00:00:00-05:00</updated>
<id>https://tvm.apache.org/2017/11/08/android-rpc-introduction</id>
<content type="html"><p>TVM stack is an end to end compilation stack
to deploy deep learning workloads to all hardware backends.
Thanks to the NNVM compiler support of TVM stack, we can now directly compile
descriptions from deep learning frameworks and compile them to bare metal code.
@@ -4281,7 +4281,7 @@ make jvminstall
<entry>
<title>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm</title>
<link
href="https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm"/>
- <updated>2017-10-30T00:00:00-07:00</updated>
+ <updated>2017-10-30T00:00:00-04:00</updated>
<id>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</id>
<content type="html"><p style="text-align: center">Aditya
Atluri, Advanced Micro Devices, Inc.</p>
<p style="text-align: center">Masahiro Masuda, Ziosoft,
Inc.</p>
@@ -4339,7 +4339,7 @@ TVM prediction top-1: 282 tiger
cat</code></pre></figure>
<h2 id="a-note-on-performance">A Note on performance</h2>
-<p>The current support on ROCm focuses on the functionality coverage. We
have already seen promising performance results by simply adopting existing TVM
schedules for CUDA backend. For example, you can try running <a
href="https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py">the
gemm test script</a> in the TVM repository and see the result. For two
types of cards we tested, the current gemm recipe for square matrix
multiplication (not [...]
+<p>The current support on ROCm focuses on the functionality coverage. We
have already seen promising performance results by simply adopting existing TVM
schedules for CUDA backend. For example, you can try running <a
href="https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py">the
gemm test script</a> in the TVM repository and see the result. For two
types of cards we tested, the current gemm recipe for square matrix multiplica
[...]
This is already a promising start, as it is very hard to optimize performance
to get to peak and we
did not yet apply AMD GPU specific optimizations.
We are starting to look at performance optimization and we expect more
improvement to come.</p>
@@ -4507,7 +4507,7 @@ BB0_6:
<entry>
<title>NNVM Compiler: Open Compiler for AI Frameworks</title>
<link href="https://tvm.apache.org/2017/10/06/nnvm-compiler-announcement"/>
- <updated>2017-10-06T08:30:00-07:00</updated>
+ <updated>2017-10-06T11:30:00-04:00</updated>
<id>https://tvm.apache.org/2017/10/06/nnvm-compiler-announcement</id>
<content type="html"><p style="text-align: center">Paul G.
Allen School of Computer Science &amp; Engineering, University of
Washington</p>
<p style="text-align: center">Amazon Web Service AI
team</p>
diff --git a/feed.xml b/feed.xml
index 0f56d75..2406202 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8"?><feed
xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/"
version="4.1.1">Jekyll</generator><link href="/feed.xml" rel="self"
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html"
/><updated>2020-11-02T16:31:02-08:00</updated><id>/feed.xml</id><title
type="html">TVM</title><author><name>{"name"=>nil}</name></author><entry><title
type="html">Bring Your Own Datatypes: Enabling Custom Datatype [...]
+<?xml version="1.0" encoding="utf-8"?><feed
xmlns="http://www.w3.org/2005/Atom" ><generator uri="https://jekyllrb.com/"
version="4.1.1">Jekyll</generator><link href="/feed.xml" rel="self"
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html"
/><updated>2020-11-03T09:01:59-05:00</updated><id>/feed.xml</id><title
type="html">TVM</title><author><name>{"name"=>nil}</name></author><entry><title
type="html">Bring Your Own Datatypes: Enabling Custom Datatype [...]
<h2 id="introduction">Introduction</h2>
@@ -282,7 +282,7 @@ For more documentation about the Bring Your Own Datatypes
framework
<p><a
href="https://posithub.org/docs/BeatingFloatingPoint.pdf"
target="_blank">Beating Floating Point at its Own Game: Posit
Arithmetic</a> <a href="#fnref:posit"
class="reversefootnote"
role="doc-backlink">&#8617;</a></p>
</li>
</ol>
-</div></content><author><name>Gus Smith, Andrew
Liu</name></author><summary type="html">In this post, we describe the Bring
Your Own Datatypes framework, which enables the use of custom datatypes within
TVM.</summary></entry><entry><title type="html">How to Bring Your Own Codegen
to TVM</title><link href="/2020/07/15/how-to-bring-your-own-codegen-to-tvm"
rel="alternate" type="text/html" title="How to Bring Your Own Codegen to TVM"
/><published>2020-07-15T00:00:00-07:00</published>< [...]
+</div></content><author><name>Gus Smith, Andrew
Liu</name></author><summary type="html">In this post, we describe the Bring
Your Own Datatypes framework, which enables the use of custom datatypes within
TVM.</summary></entry><entry><title type="html">How to Bring Your Own Codegen
to TVM</title><link href="/2020/07/15/how-to-bring-your-own-codegen-to-tvm"
rel="alternate" type="text/html" title="How to Bring Your Own Codegen to TVM"
/><published>2020-07-15T00:00:00-04:00</published>< [...]
<p>However, users have to learn a new programming interface when they
attempt to work on a new kernel library or a device. As a result, the demand
for a unified programming interface becomes more and more important to let all
users and hardware backend providers stand on the same page.</p>
@@ -751,7 +751,7 @@ Figure 4: After Graph Partitioning.
<h2 id="acknowledgment">Acknowledgment</h2>
-<p>We would like to thank our colleague Animesh Jain for valuable
discussions in the framework design; Tianqi Chen and Jared Roesch from OctoML
for system design discussions and prototyping; Masahiro Masuda from the TVM
community to help code review and improve the DNNL integration. We would also
like to thank Ramana Radhakrishnan, Matthew Barrett, Manupa Karunaratne, and
Luke Hutton from ARM, U.K. for contributing several helpful ideas, related
Relay passes, and the Arm Compute Li [...]
+<p>We would like to thank our colleague Animesh Jain for valuable
discussions in the framework design; Tianqi Chen and Jared Roesch from OctoML
for system design discussions and prototyping; Masahiro Masuda from the TVM
community to help code review and improve the DNNL integration. We would also
like to thank Ramana Radhakrishnan, Matthew Barrett, Manupa Karunaratne, and
Luke Hutton from ARM, U.K. for contributing several helpful ideas, related
Relay passes, and the Arm Compute Li [...]
the Jupyter Notebook to follow along is on <a
href="https://github.com/t-vi/pytorch-tvmisc/tree/master/transformers-pytorch-tvm/">github</a>.)</p>
<p>Some of the most intriguing applications of Artificial Intelligence
have been in Natural Language Processing.
@@ -1264,7 +1264,7 @@ one would want to re-do cheap computation, most
prominently point-wise computati
<h1 id="author">Author</h1>
<p><a href="https://lernapparat.de/">Thomas
Viehmann</a> is the founder of <a
href="https://mathinf.eu/">MathInf GmbH</a>, Munich,
Germany, a boutique training and consultancy firm focusing on Machine Learning
and PyTorch.
-He is a PyTorch core developer and co-authored <a
href="https://www.manning.com/books/deep-learning-with-pytorch">Deep
Learning with PyTorch</a>, which currently available as <a
href="https://pytorch.org/deep-learning-with-pytorch">free
download from the PyTorch
website</a>.</p></content><author><name>Thomas Viehmann, MathInf
GmbH</name></author><summary type="html"></summary></entry><entry><title
type="html">TinyML - How TVM is Taming Ti [...]
+He is a PyTorch core developer and co-authored <a
href="https://www.manning.com/books/deep-learning-with-pytorch">Deep
Learning with PyTorch</a>, which currently available as <a
href="https://pytorch.org/deep-learning-with-pytorch">free
download from the PyTorch
website</a>.</p></content><author><name>Thomas Viehmann, MathInf
GmbH</name></author><summary type="html"></summary></entry><entry><title
type="html">TinyML - How TVM is Taming Ti [...]
<p>The proliferation of low-cost, AI-powered consumer devices has led to
widespread interest in “bare-metal” (low-power, often without an operating
system) devices among ML researchers and practitioners. While it is already
possible for experts to run <em>some</em> models on
<em>some</em> bare-metal devices, optimizing models for diverse
sets of devices is challenging, often requiring manually optimized
device-specific libraries. And for those platforms wi [...]
@@ -1563,7 +1563,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix
multiplication microkernel</
<li><a
href="https://homes.cs.washington.edu/~moreau/">Thierry
Moreau</a>, for mentoring me during my time at OctoML.</li>
<li><a
href="https://homes.cs.washington.edu/~vegaluis/">Luis
Vega</a>, for teaching me the fundamentals of interacting with
microcontrollers.</li>
<li><a
href="https://www.linkedin.com/in/themadrasi/?originalSubdomain=uk">Ramana
Radhakrishnan</a>, for supplying the Arm hardware used in our
experiments and for providing guidance on its usage.</li>
-</ul></content><author><name>Logan Weber and Andrew Reusch,
OctoML</name></author><summary type="html"></summary></entry><entry><title
type="html">Compiling Machine Learning to WASM and WebGPU with Apache
TVM</title><link
href="/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu"
rel="alternate" type="text/html" title="Compiling Machine Learning to WASM and
WebGPU with Apache TVM"
/><published>2020-05-14T00:00:00-07:00</published><updated>2020-05-14T00:00:00-07:00</upd
[...]
+</ul></content><author><name>Logan Weber and Andrew Reusch,
OctoML</name></author><summary type="html"></summary></entry><entry><title
type="html">Compiling Machine Learning to WASM and WebGPU with Apache
TVM</title><link
href="/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu"
rel="alternate" type="text/html" title="Compiling Machine Learning to WASM and
WebGPU with Apache TVM"
/><published>2020-05-14T00:00:00-04:00</published><updated>2020-05-14T00:00:00-04:00</upd
[...]
<p>We introduced support for WASM and WebGPU to the Apache TVM deep
learning compiler. Our experiments shows that TVM’s WebGPU backend can get
<strong>close to native</strong> <strong>GPU
performance</strong> when deploying models to the web.</p>
@@ -1641,7 +1641,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix
multiplication microkernel</
<h2 id="acknowledgement">Acknowledgement</h2>
-<p>We would like to thank the emscripten project for providing the WASM
compilation infrastructures as well as the JS library support on the web. We
would also like to thank the WebGPU community for various helpful discussions.
Thanks to Fletcher Haynes for valuable feedbacks to the
post.</p></content><author><name>Tianqi Chen and Jared Roesch,
OctoML</name></author><summary type="html">TLDR</summary></entry><entry><title
type="html">Integrating TVM into PyTorch</title><link [...]
+<p>We would like to thank the emscripten project for providing the WASM
compilation infrastructures as well as the JS library support on the web. We
would also like to thank the WebGPU community for various helpful discussions.
Thanks to Fletcher Haynes for valuable feedbacks to the
post.</p></content><author><name>Tianqi Chen and Jared Roesch,
OctoML</name></author><summary type="html">TLDR</summary></entry><entry><title
type="html">Integrating TVM into PyTorch</title><link [...]
it has become clear that PyTorch stands to benefit from directly leveraging
the compiler stack.
A major tenet of PyTorch is providing seamless and robust integrations that
don’t get in the user’s way.
To that end, PyTorch now has an official TVM-based backend, <a
href="https://github.com/pytorch/tvm">torch_tvm</a>.</p>
@@ -1733,7 +1733,7 @@ def mul(a, b, c):
# via script
relay_graph = torch_tvm.to_relay(mul, inputs)
-</code></pre></div></div></content><author><name>Bram
Wasti</name></author><summary type="html">As TVM continuously demonstrates
improvements to the efficiency of deep learning execution, it has become clear
that PyTorch stands to benefit from directly leveraging the compiler stack. A
major tenet of PyTorch is providing seamless and robust integrations that don’t
get in the user’s way. To that end, PyTorch now has an official TVM-based
backend, torch_tvm.</summary [...]
+</code></pre></div></div></content><author><name>Bram
Wasti</name></author><summary type="html">As TVM continuously demonstrates
improvements to the efficiency of deep learning execution, it has become clear
that PyTorch stands to benefit from directly leveraging the compiler stack. A
major tenet of PyTorch is providing seamless and robust integrations that don’t
get in the user’s way. To that end, PyTorch now has an official TVM-based
backend, torch_tvm.</summary [...]
On real-time scenarios such as inference on autonomous vehicles, the inference
speed of the model is critical.
Network quantization is an effective approach to accelerating deep learning
models.
In quantized models, both data and model parameters are represented with low
precision data types such as <code class="language-plaintext
highlighter-rouge">int8</code> and <code
class="language-plaintext highlighter-rouge">float16</code>.
@@ -1800,7 +1800,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and
weight layout in OIHW4o4
</div>
<p></p>
-<p>After we have specified the layout of convolution layers, other
operators such as <code class="language-plaintext
highlighter-rouge">add</code> and activations can automatically
adapt to the chosen layout during the <a
href="https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a>
pass in Relay.
+<p>After we have specified the layout of convolution layers, other
operators such as <code class="language-plaintext
highlighter-rouge">add</code> and activations can automatically
adapt to the chosen layout during the <a
href="https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a>
pass in Relay.
The layout transformation of the weight can be precomputed offline. Therefore,
we can run the whole model in the same layout without extra overhead.</p>
<h2
id="designing-search-space-for-automatic-optimization">Designing
Search Space for Automatic Optimization</h2>
@@ -1861,14 +1861,14 @@ We show that automatic optimization in TVM makes it
easy and flexible to support
<h1 id="show-me-the-code">Show Me the Code</h1>
<ul>
<li><a
href="https://github.com/vinx13/tvm-cuda-int8-benchmark">Benchmark</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py">CUDA
int8 conv2d</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA
int8 group conv2d</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py">CUDA
int8 dense</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py">Tensor
intrinsics declaration</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py">CUDA
int8 conv2d</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA
int8 group conv2d</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py">CUDA
int8 dense</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py">Tensor
intrinsics declaration</a></li>
</ul>
<h1 id="bio--acknowledgement">Bio &amp;
Acknowledgement</h1>
-<p><a href="https://wuwei.io/">Wuwei Lin</a> is an
undergraduate student at SJTU. He is currently an intern at TuSimple. The
author has many thanks to <a
href="https://homes.cs.washington.edu/~tqchen/">Tianqi
Chen</a> and <a
href="https://homes.cs.washington.edu/~eqy/">Eddie Yan</a>
for their reviews.</p></content><author><name>Wuwei
Lin</name></author><summary type="html">Deep learning has been successfully ap
[...]
+<p><a href="https://wuwei.io/">Wuwei Lin</a> is an
undergraduate student at SJTU. He is currently an intern at TuSimple. The
author has many thanks to <a
href="https://homes.cs.washington.edu/~tqchen/">Tianqi
Chen</a> and <a
href="https://homes.cs.washington.edu/~eqy/">Eddie Yan</a>
for their reviews.</p></content><author><name>Wuwei
Lin</name></author><summary type="html">Deep learning has been successfully ap
[...]
<p>TVM is an open source deep learning compiler stack that closes the
gap between the productivity-focused deep learning frameworks, and the
performance- or efficiency-oriented hardware backends. Today, we are glad to
announce that the TVM community has decided to move on to Apache incubator, and
becomes an Apache(incubating) project.</p>
@@ -1882,7 +1882,7 @@ We show that automatic optimization in TVM makes it easy
and flexible to support
<p>We would like to take this chance to thank the Allen School for
supporting the SAMPL team that gave birth to the TVM project. We would also
like to thank the Halide project which provided the basis for TVM’s loop-level
IR and initial code generation. We would like to thank our Apache incubator
mentors for introducing the project to Apache and providing useful guidance.
Finally, we would like to thank the TVM community and all of the organizations,
as listed above, that supported [...]
-<p>See also the <a
href="https://news.cs.washington.edu/2019/03/18/allen-schools-tvm-deep-learning-compiler-framework-transitions-to-apache/">Allen
School news about the transition here</a>, <a
href="https://sampl.cs.washington.edu/tvmconf/#about-tvmconf">TVM
conference program slides and recordings</a>, and <a
href="https://tvm.apache.org/docs//contribute/community.html">our
community guideline here</a>. Follow us o [...]
+<p>See also the <a
href="https://news.cs.washington.edu/2019/03/18/allen-schools-tvm-deep-learning-compiler-framework-transitions-to-apache/">Allen
School news about the transition here</a>, <a
href="https://sampl.cs.washington.edu/tvmconf/#about-tvmconf">TVM
conference program slides and recordings</a>, and <a
href="https://tvm.apache.org/docs//contribute/community.html">our
community guideline here</a>. Follow us o [...]
<p>TVM is an open deep learning compiler stack to compile various deep
learning models from different
frameworks to CPU, GPU or specialized accelerators. TVM supports model
compilation from a wide range
@@ -2023,14 +2023,14 @@ For simplicity the error handling is ignored here, but
is important in real appl
</code></pre></div></div>
<p><code class="language-plaintext
highlighter-rouge">gotvm</code> extends the TVM packed function
system to support golang function closures as packed functions.
-<a
href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a>
available to register golang
+<a
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a>
available to register golang
closure as TVM packed function and invoke the same across programming language
barriers.</p>
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/golang/src">Package
Source</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/golang/src">Package
Source</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a></li>
</ul>
<h2 id="references">References</h2>
@@ -2043,7 +2043,7 @@ closure as TVM packed function and invoke the same across
programming language b
<li>[5] <a
href="https://blog.learngoprogramming.com/golang-variadic-funcs-how-to-patterns-369408f19085">Go
Variadic Functions</a></li>
<li>[6] <a
href="https://github.com/jdeng/gomxnet">CFFI
Ref</a></li>
<li>[7] <a
href="https://golang.org/pkg/runtime/#SetFinalizer">Go
Finalizers</a></li>
-</ul></content><author><name>Siva</name></author><summary
type="html">Introduction</summary></entry><entry><title type="html">Automating
Generation of Low Precision Deep Learning Operators</title><link
href="/2018/12/18/lowprecision-conv" rel="alternate" type="text/html"
title="Automating Generation of Low Precision Deep Learning Operators"
/><published>2018-12-18T00:00:00-08:00</published><updated>2018-12-18T00:00:00-08:00</updated><id>/2018/12/18/lowprecision-conv</id><content
ty [...]
+</ul></content><author><name>Siva</name></author><summary
type="html">Introduction</summary></entry><entry><title type="html">Automating
Generation of Low Precision Deep Learning Operators</title><link
href="/2018/12/18/lowprecision-conv" rel="alternate" type="text/html"
title="Automating Generation of Low Precision Deep Learning Operators"
/><published>2018-12-18T00:00:00-05:00</published><updated>2018-12-18T00:00:00-05:00</updated><id>/2018/12/18/lowprecision-conv</id><content
ty [...]
devices becomes challenging because of their limited compute and energy
budgets. A recent trend
in deep learning is the use of extremely quantized models that
operate on inputs and
weights of a few bits, with networks like XNOR-Net, DoReFa-Net, and
HWGQ-Net making steady
@@ -2183,8 +2183,8 @@ Note: x86 doesn’t support a vectorized popcount for this
microarchitecture, so
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py">TOPI
bitserial convolution</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI
ARM cpu bitserial convolution</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py">TOPI
bitserial convolution</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI
ARM cpu bitserial convolution</a></li>
</ul>
<h2 id="references">References</h2>
diff --git a/rss.xml b/rss.xml
index f44a1bc..cc2324e 100644
--- a/rss.xml
+++ b/rss.xml
@@ -5,8 +5,8 @@
<description>TVM - </description>
<link>https://tvm.apache.org</link>
<atom:link href="https://tvm.apache.org" rel="self"
type="application/rss+xml" />
- <lastBuildDate>Mon, 02 Nov 2020 16:31:02 -0800</lastBuildDate>
- <pubDate>Mon, 02 Nov 2020 16:31:02 -0800</pubDate>
+ <lastBuildDate>Tue, 03 Nov 2020 09:01:59 -0500</lastBuildDate>
+ <pubDate>Tue, 03 Nov 2020 09:01:59 -0500</pubDate>
<ttl>60</ttl>
@@ -300,7 +300,7 @@ For more documentation about the Bring Your Own Datatypes
framework
</description>
<link>https://tvm.apache.org/2020/09/26/bring-your-own-datatypes</link>
<guid>https://tvm.apache.org/2020/09/26/bring-your-own-datatypes</guid>
- <pubDate>Sat, 26 Sep 2020 00:00:00 -0700</pubDate>
+ <pubDate>Sat, 26 Sep 2020 00:00:00 -0400</pubDate>
</item>
<item>
@@ -779,7 +779,7 @@ Figure 4: After Graph Partitioning.
</description>
<link>https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm</link>
<guid>https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm</guid>
- <pubDate>Wed, 15 Jul 2020 00:00:00 -0700</pubDate>
+ <pubDate>Wed, 15 Jul 2020 00:00:00 -0400</pubDate>
</item>
<item>
@@ -1302,7 +1302,7 @@ He is a PyTorch core developer and co-authored <a
href="https://www.mann
</description>
<link>https://tvm.apache.org/2020/07/14/bert-pytorch-tvm</link>
<guid>https://tvm.apache.org/2020/07/14/bert-pytorch-tvm</guid>
- <pubDate>Tue, 14 Jul 2020 00:00:00 -0700</pubDate>
+ <pubDate>Tue, 14 Jul 2020 00:00:00 -0400</pubDate>
</item>
<item>
@@ -1611,7 +1611,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix
multiplication microkernel</
</description>
<link>https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny</link>
<guid>https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny</guid>
- <pubDate>Thu, 04 Jun 2020 00:00:00 -0700</pubDate>
+ <pubDate>Thu, 04 Jun 2020 00:00:00 -0400</pubDate>
</item>
<item>
@@ -1698,7 +1698,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix
multiplication microkernel</
</description>
<link>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</link>
<guid>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</guid>
- <pubDate>Thu, 14 May 2020 00:00:00 -0700</pubDate>
+ <pubDate>Thu, 14 May 2020 00:00:00 -0400</pubDate>
</item>
<item>
@@ -1800,7 +1800,7 @@ relay_graph = torch_tvm.to_relay(mul, inputs)
</description>
<link>https://tvm.apache.org/2019/05/30/pytorch-frontend</link>
<guid>https://tvm.apache.org/2019/05/30/pytorch-frontend</guid>
- <pubDate>Thu, 30 May 2019 00:00:00 -0700</pubDate>
+ <pubDate>Thu, 30 May 2019 00:00:00 -0400</pubDate>
</item>
<item>
@@ -1872,7 +1872,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and
weight layout in OIHW4o4
</div>
<p></p>
-<p>After we have specified the layout of convolution layers, other
operators such as <code class="language-plaintext
highlighter-rouge">add</code> and activations can automatically
adapt to the chosen layout during the <a
href="https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a>
pass in Relay.
+<p>After we have specified the layout of convolution layers, other
operators such as <code class="language-plaintext
highlighter-rouge">add</code> and activations can automatically
adapt to the chosen layout during the <a
href="https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc">AlterOpLayout</a>
pass in Relay.
The layout transformation of the weight can be precomputed offline. Therefore,
we can run the whole model in the same layout without extra overhead.</p>
<h2
id="designing-search-space-for-automatic-optimization">Designing
Search Space for Automatic Optimization</h2>
@@ -1933,10 +1933,10 @@ We show that automatic optimization in TVM makes it
easy and flexible to support
<h1 id="show-me-the-code">Show Me the Code</h1>
<ul>
<li><a
href="https://github.com/vinx13/tvm-cuda-int8-benchmark">Benchmark</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py">CUDA
int8 conv2d</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA
int8 group conv2d</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py">CUDA
int8 dense</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py">Tensor
intrinsics declaration</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py">CUDA
int8 conv2d</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py">CUDA
int8 group conv2d</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py">CUDA
int8 dense</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py">Tensor
intrinsics declaration</a></li>
</ul>
<h1 id="bio--acknowledgement">Bio &amp;
Acknowledgement</h1>
@@ -1944,7 +1944,7 @@ We show that automatic optimization in TVM makes it easy
and flexible to support
</description>
<link>https://tvm.apache.org/2019/04/29/opt-cuda-quantized</link>
<guid>https://tvm.apache.org/2019/04/29/opt-cuda-quantized</guid>
- <pubDate>Mon, 29 Apr 2019 09:00:00 -0700</pubDate>
+ <pubDate>Mon, 29 Apr 2019 12:00:00 -0400</pubDate>
</item>
<item>
@@ -1967,7 +1967,7 @@ We show that automatic optimization in TVM makes it easy
and flexible to support
</description>
<link>https://tvm.apache.org/2019/03/18/tvm-apache-announcement</link>
<guid>https://tvm.apache.org/2019/03/18/tvm-apache-announcement</guid>
- <pubDate>Mon, 18 Mar 2019 00:00:00 -0700</pubDate>
+ <pubDate>Mon, 18 Mar 2019 00:00:00 -0400</pubDate>
</item>
<item>
@@ -2113,14 +2113,14 @@ For simplicity the error handling is ignored here, but
is important in real appl
</code></pre></div></div>
<p><code class="language-plaintext
highlighter-rouge">gotvm</code> extends the TVM packed function
system to support golang function closures as packed functions.
-<a
href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a>
available to register golang
+<a
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a>
available to register golang
closure as TVM packed function and invoke the same across programming language
barriers.</p>
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/golang/src">Package
Source</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/golang/sample">Examples</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/golang/src">Package
Source</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample">Examples</a></li>
</ul>
<h2 id="references">References</h2>
@@ -2137,7 +2137,7 @@ closure as TVM packed function and invoke the same across
programming language b
</description>
<link>https://tvm.apache.org/2019/01/19/Golang</link>
<guid>https://tvm.apache.org/2019/01/19/Golang</guid>
- <pubDate>Sat, 19 Jan 2019 00:00:00 -0800</pubDate>
+ <pubDate>Sat, 19 Jan 2019 00:00:00 -0500</pubDate>
</item>
<item>
@@ -2282,8 +2282,8 @@ Note: x86 doesn’t support a vectorized popcount for this
microarchitecture, so
<h2 id="show-me-the-code">Show me the code</h2>
<ul>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py">TOPI
bitserial convolution</a></li>
- <li><a
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI
ARM cpu bitserial convolution</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py">TOPI
bitserial convolution</a></li>
+ <li><a
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py">TOPI
ARM cpu bitserial convolution</a></li>
</ul>
<h2 id="references">References</h2>
@@ -2298,7 +2298,7 @@ Note: x86 doesn’t support a vectorized popcount for this
microarchitecture, so
</description>
<link>https://tvm.apache.org/2018/12/18/lowprecision-conv</link>
<guid>https://tvm.apache.org/2018/12/18/lowprecision-conv</guid>
- <pubDate>Tue, 18 Dec 2018 00:00:00 -0800</pubDate>
+ <pubDate>Tue, 18 Dec 2018 00:00:00 -0500</pubDate>
</item>
<item>
@@ -2414,7 +2414,7 @@ His research interest is in the general domain of ML on
shared private data, but
</description>
<link>https://tvm.apache.org/2018/10/09/ml-in-tees</link>
<guid>https://tvm.apache.org/2018/10/09/ml-in-tees</guid>
- <pubDate>Tue, 09 Oct 2018 00:00:00 -0700</pubDate>
+ <pubDate>Tue, 09 Oct 2018 00:00:00 -0400</pubDate>
</item>
<item>
@@ -2808,7 +2808,7 @@ for inference deployment. TVM just provides such a
solution.</p>
</description>
<link>https://tvm.apache.org/2018/10/03/auto-opt-all</link>
<guid>https://tvm.apache.org/2018/10/03/auto-opt-all</guid>
- <pubDate>Wed, 03 Oct 2018 00:00:00 -0700</pubDate>
+ <pubDate>Wed, 03 Oct 2018 00:00:00 -0400</pubDate>
</item>
<item>
@@ -2923,7 +2923,7 @@ found <a
href="https://tvm.apache.org/docs//tutorials/optimize/opt_gemm.
</code></pre></div></div>
<h2 id="under-the-hood-of-the-pytorch-example">Under the hood
of the PyTorch Example</h2>
-<p>As TVM provides <a
href="https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455">functions</a>
to convert dlpack tensors to tvm <code class="language-plaintext
highlighter-rouge">NDArray</code>s and
+<p>As TVM provides <a
href="https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h#L455">functions</a>
to convert dlpack tensors to tvm <code class="language-plaintext
highlighter-rouge">NDArray</code>s and
vice-versa, so all that is needed is some syntactic sugar by wrapping
functions.
<code class="language-plaintext
highlighter-rouge">convert_func</code> is a generic converter for
frameworks using tensors with dlpack
support, and can be used to implement convenient converters, such as
@@ -2947,7 +2947,7 @@ support, and can be used to implement convenient
converters, such as
</description>
<link>https://tvm.apache.org/2018/08/10/DLPack-Bridge</link>
<guid>https://tvm.apache.org/2018/08/10/DLPack-Bridge</guid>
- <pubDate>Fri, 10 Aug 2018 00:00:00 -0700</pubDate>
+ <pubDate>Fri, 10 Aug 2018 00:00:00 -0400</pubDate>
</item>
<item>
@@ -2962,7 +2962,7 @@ support, and can be used to implement convenient
converters, such as
<p>VTA is more than a standalone accelerator design: it’s an end-to-end
solution that includes drivers, a JIT runtime, and an optimizing compiler stack
based on TVM. The current release includes a behavioral hardware simulator, as
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast
prototyping. By extending the TVM stack with a customizable, and open source
deep learning hardware accelerator design, we are exposing a transparent
end-to-end deep learning stac [...]
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
<p>The VTA and TVM stack together constitute a blueprint for end-to-end,
accelerator-centric deep learning system that can:</p>
@@ -3017,7 +3017,7 @@ The extendability of the compiler stack, combined with
the ability to modify the
<p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning
accelerator built around a GEMM core, which performs dense matrix
multiplication at a high computational throughput.
The design is inspired by mainstream deep learning accelerators, of the likes
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide
memory access latency and maximize utilization of compute resources. To a
broader extent, VTA can serve as a template deep learning accelerator design,
exposing a clean tensor computation abstraction to the compiler stack.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
<p>The figure above presents a high-level overview of the VTA hardware
organization. VTA is composed of four modules that communicate between each
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to
allow for task-level pipeline parallelism.
The compute module performs both dense linear algebra computation with its
GEMM core, and general computation with its tensor ALU.
@@ -3034,7 +3034,7 @@ The first approach, which doesn’t require special
hardware is to run deep lear
This simulator back-end is readily available for developers to experiment with.
The second approach relies on an off-the-shelf and low-cost FPGA development
board – the <a href="http://www.pynq.io/">Pynq board</a>,
which exposes a reconfigurable FPGA fabric and an ARM SoC.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
<p>The VTA release offers a simple compilation and deployment flow of
the VTA hardware design and TVM workloads on the Pynq platform, with the help
of an RPC server interface.
The RPC server handles FPGA reconfiguration tasks and TVM module invocation
offloading onto the VTA runtime.
@@ -3057,7 +3057,7 @@ While this platform is meant for prototyping (the 2012
FPGA cannot compete with
<p>A popular method used to assess the efficient use of hardware are
roofline diagrams: given a hardware design, how efficiently are different
workloads utilizing the hardware compute and memory resources. The roofline
plot below shows the throughput achieved on different convolution layers of the
ResNet-18 inference benchmark. Each layer has a different arithmetic intensity,
i.e. compute to data movement ratio.
In the left half, convolution layers are bandwidth limited, whereas on the
right half, they are compute limited.</p>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
<p>The goal behind designing a hardware architecture, and a compiler
stack is to bring each workload as close as possible to the roofline of the
target hardware.
The roofline plot shows the effects of having the hardware and compiler work
together to maximize utilization of the available hardware resources.
@@ -3066,7 +3066,7 @@ The result is an overall higher utilization of the
available compute and memory
<h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18
evaluation</h3>
-<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
<p>A benefit of having a complete compiler stack built for VTA is the
ability to run end-to-end workloads. This is compelling in the context of
hardware acceleration because we need to understand what performance
bottlenecks, and Amdahl limitations stand in the way to obtaining faster
performance.
The bar plot above shows inference performance with and without offloading the
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s
ARM Cortex A9 SoC.
@@ -3089,7 +3089,7 @@ This kind of high-level visibility is essential to system
designers who want to
</description>
<link>https://tvm.apache.org/2018/07/12/vta-release-announcement</link>
<guid>https://tvm.apache.org/2018/07/12/vta-release-announcement</guid>
- <pubDate>Thu, 12 Jul 2018 00:00:00 -0700</pubDate>
+ <pubDate>Thu, 12 Jul 2018 00:00:00 -0400</pubDate>
</item>
<item>
@@ -3355,7 +3355,7 @@ C = tvm.compute(
</description>
<link>https://tvm.apache.org/2018/03/23/nmt-transformer-optimize</link>
<guid>https://tvm.apache.org/2018/03/23/nmt-transformer-optimize</guid>
- <pubDate>Fri, 23 Mar 2018 00:00:00 -0700</pubDate>
+ <pubDate>Fri, 23 Mar 2018 00:00:00 -0400</pubDate>
</item>
<item>
@@ -3471,7 +3471,7 @@ optimizations into the TVM stack.</p>
</description>
<link>https://tvm.apache.org/2018/03/12/webgl</link>
<guid>https://tvm.apache.org/2018/03/12/webgl</guid>
- <pubDate>Mon, 12 Mar 2018 00:00:00 -0700</pubDate>
+ <pubDate>Mon, 12 Mar 2018 00:00:00 -0400</pubDate>
</item>
<item>
@@ -4045,7 +4045,7 @@ advice and <a
href="https://github.com/yzhliu">Yizhi Liu</a&g
</description>
<link>https://tvm.apache.org/2018/01/16/opt-mali-gpu</link>
<guid>https://tvm.apache.org/2018/01/16/opt-mali-gpu</guid>
- <pubDate>Tue, 16 Jan 2018 00:00:00 -0800</pubDate>
+ <pubDate>Tue, 16 Jan 2018 00:00:00 -0500</pubDate>
</item>
<item>
@@ -4273,7 +4273,7 @@ make jvminstall
</description>
<link>https://tvm.apache.org/2017/11/08/android-rpc-introduction</link>
<guid>https://tvm.apache.org/2017/11/08/android-rpc-introduction</guid>
- <pubDate>Wed, 08 Nov 2017 00:00:00 -0800</pubDate>
+ <pubDate>Wed, 08 Nov 2017 00:00:00 -0500</pubDate>
</item>
<item>
@@ -4334,7 +4334,7 @@ TVM prediction top-1: 282 tiger
cat</code></pre></figure>
<h2 id="a-note-on-performance">A Note on performance</h2>
-<p>The current support on ROCm focuses on the functionality coverage. We
have already seen promising performance results by simply adopting existing TVM
schedules for CUDA backend. For example, you can try running <a
href="https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py">the
gemm test script</a> in the TVM repository and see the result. For two
types of cards we tested, the current gemm recipe for square matrix
multiplication (not [...]
+<p>The current support on ROCm focuses on the functionality coverage. We
have already seen promising performance results by simply adopting existing TVM
schedules for CUDA backend. For example, you can try running <a
href="https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py">the
gemm test script</a> in the TVM repository and see the result. For two
types of cards we tested, the current gemm recipe for square matrix multiplica
[...]
This is already a promising start, as it is very hard to optimize performance
to get to peak and we
did not yet apply AMD GPU specific optimizations.
We are starting to look at performance optimization and we expect more
improvement to come.</p>
@@ -4499,7 +4499,7 @@ BB0_6:
</description>
<link>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</link>
<guid>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</guid>
- <pubDate>Mon, 30 Oct 2017 00:00:00 -0700</pubDate>
+ <pubDate>Mon, 30 Oct 2017 00:00:00 -0400</pubDate>
</item>
<item>
@@ -4582,7 +4582,7 @@ We also learns from Halide when implementing the lowering
pipeline in TVM.</l
</description>
<link>https://tvm.apache.org/2017/10/06/nnvm-compiler-announcement</link>
<guid>https://tvm.apache.org/2017/10/06/nnvm-compiler-announcement</guid>
- <pubDate>Fri, 06 Oct 2017 08:30:00 -0700</pubDate>
+ <pubDate>Fri, 06 Oct 2017 11:30:00 -0400</pubDate>
</item>