This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-site.git


The following commit(s) were added to refs/heads/asf-site by this push:
     new 2330d86  Build at Tue Nov  3 09:02:03 EST 2020
2330d86 is described below

commit 2330d862e2d490be1c9e5633de8b550e14182c52
Author: tqchen <tianqi.tc...@gmail.com>
AuthorDate: Tue Nov 3 09:02:03 2020 -0500

    Build at Tue Nov  3 09:02:03 EST 2020
---
 2017/08/17/tvm-release-announcement.html           |  2 +-
 ...s-with-TVM-A-Depthwise-Convolution-Example.html |  8 +--
 2017/10/06/nnvm-compiler-announcement.html         |  2 +-
 ...s-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html |  4 +-
 2017/11/08/android-rpc-introduction.html           |  2 +-
 2018/01/16/opt-mali-gpu.html                       |  2 +-
 2018/03/12/webgl.html                              |  2 +-
 2018/03/23/nmt-transformer-optimize.html           |  2 +-
 2018/07/12/vta-release-announcement.html           | 12 ++--
 2018/08/10/DLPack-Bridge.html                      |  4 +-
 2018/10/03/auto-opt-all.html                       |  2 +-
 2018/10/09/ml-in-tees.html                         |  2 +-
 2018/12/18/lowprecision-conv.html                  |  6 +-
 2019/01/19/Golang.html                             |  8 +--
 2019/03/18/tvm-apache-announcement.html            |  2 +-
 2019/04/29/opt-cuda-quantized.html                 | 12 ++--
 2019/05/30/pytorch-frontend.html                   |  2 +-
 ...machine-learning-to-webassembly-and-webgpu.html |  2 +-
 2020/06/04/tinyml-how-tvm-is-taming-tiny.html      |  2 +-
 2020/07/14/bert-pytorch-tvm.html                   |  2 +-
 .../15/how-to-bring-your-own-codegen-to-tvm.html   |  2 +-
 2020/09/26/bring-your-own-datatypes.html           |  2 +-
 atom.xml                                           | 76 ++++++++++-----------
 feed.xml                                           | 40 +++++------
 rss.xml                                            | 78 +++++++++++-----------
 25 files changed, 139 insertions(+), 139 deletions(-)

diff --git a/2017/08/17/tvm-release-announcement.html 
b/2017/08/17/tvm-release-announcement.html
index e5ee2d1..9b83eb3 100644
--- a/2017/08/17/tvm-release-announcement.html
+++ b/2017/08/17/tvm-release-announcement.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>TVM: An End to End IR Stack for Deploying Deep Learning Workloads on 
Hardware Platforms </h1>
       <p class="post-meta">
-        <time datetime="2017-08-17T12:00:00-07:00" itemprop="datePublished">
+        <time datetime="2017-08-17T15:00:00-04:00" itemprop="datePublished">
           Aug 17, 2017
         </time>
         
diff --git 
a/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
 
b/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
index 5d0fa56..a03a6bf 100644
--- 
a/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
+++ 
b/2017/08/22/Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Optimize Deep Learning GPU Operators with TVM: A Depthwise 
Convolution Example </h1>
       <p class="post-meta">
-        <time datetime="2017-08-22T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2017-08-22T00:00:00-04:00" itemprop="datePublished">
           Aug 22, 2017
         </time>
         
@@ -705,9 +705,9 @@ Below is the result with Input = [1, 256, 96, 96], Filter = 
[256, 1, 3, 3], stri
 
 <h2 id="show-me-the-code">Show me the code</h2>
 <ul>
-  <li>Declare: <a 
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py";>https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py</a></li>
-  <li>Schedule: <a 
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py";>https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py</a></li>
-  <li>Test: <a 
href="https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py";>https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py</a></li>
+  <li>Declare: <a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/depthwise_conv2d.py";>https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/depthwise_conv2d.py</a></li>
+  <li>Schedule: <a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/depthwise_conv2d.py";>https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/depthwise_conv2d.py</a></li>
+  <li>Test: <a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/recipe/conv/depthwise_conv2d_test.py";>https://github.com/apache/incubator-tvm/blob/main/topi/recipe/conv/depthwise_conv2d_test.py</a></li>
 </ul>
 
 <h2 id="acknowledgements">Acknowledgements</h2>
diff --git a/2017/10/06/nnvm-compiler-announcement.html 
b/2017/10/06/nnvm-compiler-announcement.html
index d7b9c05..d3eb49f 100644
--- a/2017/10/06/nnvm-compiler-announcement.html
+++ b/2017/10/06/nnvm-compiler-announcement.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>NNVM Compiler: Open Compiler for AI Frameworks </h1>
       <p class="post-meta">
-        <time datetime="2017-10-06T08:30:00-07:00" itemprop="datePublished">
+        <time datetime="2017-10-06T11:30:00-04:00" itemprop="datePublished">
           Oct 6, 2017
         </time>
         
diff --git 
a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html 
b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
index eb4caed..1b48741 100644
--- a/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
+++ b/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm </h1>
       <p class="post-meta">
-        <time datetime="2017-10-30T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2017-10-30T00:00:00-04:00" itemprop="datePublished">
           Oct 30, 2017
         </time>
         
@@ -204,7 +204,7 @@ TVM prediction top-1: 282 tiger cat</code></pre></figure>
 
 <h2 id="a-note-on-performance">A Note on performance</h2>
 
-<p>The current support on ROCm focuses on the functionality coverage. We have 
already seen promising performance results by simply adopting existing TVM 
schedules for CUDA backend. For example, you can try running <a 
href="https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py";>the
 gemm test script</a> in the TVM repository and see the result. For two types 
of cards we tested, the current gemm recipe for square matrix multiplication 
(not yet specifically optimized f [...]
+<p>The current support on ROCm focuses on the functionality coverage. We have 
already seen promising performance results by simply adopting existing TVM 
schedules for CUDA backend. For example, you can try running <a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py";>the
 gemm test script</a> in the TVM repository and see the result. For two types 
of cards we tested, the current gemm recipe for square matrix multiplication 
(not yet specifically o [...]
 This is already a promising start, as it is very hard to optimize performance 
to get to peak and we
 did not yet apply AMD GPU specific optimizations.
 We are starting to look at performance optimization and we expect more 
improvement to come.</p>
diff --git a/2017/11/08/android-rpc-introduction.html 
b/2017/11/08/android-rpc-introduction.html
index d354c0a..104829e 100644
--- a/2017/11/08/android-rpc-introduction.html
+++ b/2017/11/08/android-rpc-introduction.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Remote Profile and Test Deep Learning Cross Compilation on Mobile 
Phones with TVM RPC </h1>
       <p class="post-meta">
-        <time datetime="2017-11-08T00:00:00-08:00" itemprop="datePublished">
+        <time datetime="2017-11-08T00:00:00-05:00" itemprop="datePublished">
           Nov 8, 2017
         </time>
         
diff --git a/2018/01/16/opt-mali-gpu.html b/2018/01/16/opt-mali-gpu.html
index 814ea6e..71d3d86 100644
--- a/2018/01/16/opt-mali-gpu.html
+++ b/2018/01/16/opt-mali-gpu.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Optimizing Mobile Deep Learning on ARM GPU with TVM </h1>
       <p class="post-meta">
-        <time datetime="2018-01-16T00:00:00-08:00" itemprop="datePublished">
+        <time datetime="2018-01-16T00:00:00-05:00" itemprop="datePublished">
           Jan 16, 2018
         </time>
         
diff --git a/2018/03/12/webgl.html b/2018/03/12/webgl.html
index 81e89ac..db05f52 100644
--- a/2018/03/12/webgl.html
+++ b/2018/03/12/webgl.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Compiling Deep Learning Models to WebGL with TVM </h1>
       <p class="post-meta">
-        <time datetime="2018-03-12T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2018-03-12T00:00:00-04:00" itemprop="datePublished">
           Mar 12, 2018
         </time>
         
diff --git a/2018/03/23/nmt-transformer-optimize.html 
b/2018/03/23/nmt-transformer-optimize.html
index 7dd4172..9ec078f 100644
--- a/2018/03/23/nmt-transformer-optimize.html
+++ b/2018/03/23/nmt-transformer-optimize.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Bringing TVM into TensorFlow for Optimizing Neural Machine 
Translation on GPU </h1>
       <p class="post-meta">
-        <time datetime="2018-03-23T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2018-03-23T00:00:00-04:00" itemprop="datePublished">
           Mar 23, 2018
         </time>
         
diff --git a/2018/07/12/vta-release-announcement.html 
b/2018/07/12/vta-release-announcement.html
index a4b1dd0..7155faa 100644
--- a/2018/07/12/vta-release-announcement.html
+++ b/2018/07/12/vta-release-announcement.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>VTA: An Open, Customizable Deep Learning Acceleration Stack  </h1>
       <p class="post-meta">
-        <time datetime="2018-07-12T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2018-07-12T00:00:00-04:00" itemprop="datePublished">
           Jul 12, 2018
         </time>
         
@@ -158,7 +158,7 @@
 
 <p>VTA is more than a standalone accelerator design: it’s an end-to-end 
solution that includes drivers, a JIT runtime, and an optimizing compiler stack 
based on TVM. The current release includes a behavioral hardware simulator, as 
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast 
prototyping. By extending the TVM stack with a customizable, and open source 
deep learning hardware accelerator design, we are exposing a transparent 
end-to-end deep learning stack from [...]
 
-<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png";
 alt="image" width="50%" /></p>
+<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png";
 alt="image" width="50%" /></p>
 
 <p>The VTA and TVM stack together constitute a blueprint for end-to-end, 
accelerator-centric deep learning system that can:</p>
 
@@ -213,7 +213,7 @@ The extendability of the compiler stack, combined with the 
ability to modify the
 <p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning accelerator 
built around a GEMM core, which performs dense matrix multiplication at a high 
computational throughput.
 The design is inspired by mainstream deep learning accelerators, of the likes 
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide 
memory access latency and maximize utilization of compute resources. To a 
broader extent, VTA can serve as a template deep learning accelerator design, 
exposing a clean tensor computation abstraction to the compiler stack.</p>
 
-<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png";
 alt="image" width="60%" /></p>
+<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png";
 alt="image" width="60%" /></p>
 
 <p>The figure above presents a high-level overview of the VTA hardware 
organization. VTA is composed of four modules that communicate between each 
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to 
allow for task-level pipeline parallelism.
 The compute module performs both dense linear algebra computation with its 
GEMM core, and general computation with its tensor ALU.
@@ -230,7 +230,7 @@ The first approach, which doesn’t require special hardware 
is to run deep lear
 This simulator back-end is readily available for developers to experiment with.
 The second approach relies on an off-the-shelf and low-cost FPGA development 
board – the <a href="http://www.pynq.io/";>Pynq board</a>, which exposes a 
reconfigurable FPGA fabric and an ARM SoC.</p>
 
-<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png";
 alt="image" width="70%" /></p>
+<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_system.png";
 alt="image" width="70%" /></p>
 
 <p>The VTA release offers a simple compilation and deployment flow of the VTA 
hardware design and TVM workloads on the Pynq platform, with the help of an RPC 
server interface.
 The RPC server handles FPGA reconfiguration tasks and TVM module invocation 
offloading onto the VTA runtime.
@@ -253,7 +253,7 @@ While this platform is meant for prototyping (the 2012 FPGA 
cannot compete with
 <p>A popular method used to assess the efficient use of hardware are roofline 
diagrams: given a hardware design, how efficiently are different workloads 
utilizing the hardware compute and memory resources. The roofline plot below 
shows the throughput achieved on different convolution layers of the ResNet-18 
inference benchmark. Each layer has a different arithmetic intensity, i.e. 
compute to data movement ratio.
 In the left half, convolution layers are bandwidth limited, whereas on the 
right half, they are compute limited.</p>
 
-<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png";
 alt="image" width="60%" /></p>
+<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_roofline.png";
 alt="image" width="60%" /></p>
 
 <p>The goal behind designing a hardware architecture, and a compiler stack is 
to bring each workload as close as possible to the roofline of the target 
hardware.
 The roofline plot shows the effects of having the hardware and compiler work 
together to maximize utilization of the available hardware resources.
@@ -262,7 +262,7 @@ The result is an overall higher utilization of the 
available compute and memory
 
 <h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18 evaluation</h3>
 
-<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png";
 alt="image" width="60%" /></p>
+<p style="text-align: center"><img 
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_e2e.png";
 alt="image" width="60%" /></p>
 
 <p>A benefit of having a complete compiler stack built for VTA is the ability 
to run end-to-end workloads. This is compelling in the context of hardware 
acceleration because we need to understand what performance bottlenecks, and 
Amdahl limitations stand in the way to obtaining faster performance.
 The bar plot above shows inference performance with and without offloading the 
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s 
ARM Cortex A9 SoC.
diff --git a/2018/08/10/DLPack-Bridge.html b/2018/08/10/DLPack-Bridge.html
index b64eead..0ec196d 100644
--- a/2018/08/10/DLPack-Bridge.html
+++ b/2018/08/10/DLPack-Bridge.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Building a Cross-Framework Deep Learning Compiler via DLPack </h1>
       <p class="post-meta">
-        <time datetime="2018-08-10T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2018-08-10T00:00:00-04:00" itemprop="datePublished">
           Aug 10, 2018
         </time>
         
@@ -262,7 +262,7 @@ found <a 
href="https://tvm.apache.org/docs//tutorials/optimize/opt_gemm.html";>he
 </code></pre></div></div>
 
 <h2 id="under-the-hood-of-the-pytorch-example">Under the hood of the PyTorch 
Example</h2>
-<p>As TVM provides <a 
href="https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455";>functions</a>
 to convert dlpack tensors to tvm <code class="language-plaintext 
highlighter-rouge">NDArray</code>s and
+<p>As TVM provides <a 
href="https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h#L455";>functions</a>
 to convert dlpack tensors to tvm <code class="language-plaintext 
highlighter-rouge">NDArray</code>s and
 vice-versa, so all that is needed is some syntactic sugar by wrapping 
functions.
 <code class="language-plaintext highlighter-rouge">convert_func</code> is a 
generic converter for frameworks using tensors with dlpack
 support, and can be used to implement convenient converters, such as
diff --git a/2018/10/03/auto-opt-all.html b/2018/10/03/auto-opt-all.html
index 005b8fc..f5f1482 100644
--- a/2018/10/03/auto-opt-all.html
+++ b/2018/10/03/auto-opt-all.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Automatic Kernel Optimization for Deep Learning on All Hardware 
Platforms </h1>
       <p class="post-meta">
-        <time datetime="2018-10-03T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2018-10-03T00:00:00-04:00" itemprop="datePublished">
           Oct 3, 2018
         </time>
         
diff --git a/2018/10/09/ml-in-tees.html b/2018/10/09/ml-in-tees.html
index 85f637d..3838be6 100644
--- a/2018/10/09/ml-in-tees.html
+++ b/2018/10/09/ml-in-tees.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Efficient Privacy-Preserving ML Using TVM </h1>
       <p class="post-meta">
-        <time datetime="2018-10-09T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2018-10-09T00:00:00-04:00" itemprop="datePublished">
           Oct 9, 2018
         </time>
         
diff --git a/2018/12/18/lowprecision-conv.html 
b/2018/12/18/lowprecision-conv.html
index e1fafc9..31738a3 100644
--- a/2018/12/18/lowprecision-conv.html
+++ b/2018/12/18/lowprecision-conv.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Automating Generation of Low Precision Deep Learning Operators </h1>
       <p class="post-meta">
-        <time datetime="2018-12-18T00:00:00-08:00" itemprop="datePublished">
+        <time datetime="2018-12-18T00:00:00-05:00" itemprop="datePublished">
           Dec 18, 2018
         </time>
         
@@ -292,8 +292,8 @@ Note: x86 doesn’t support a vectorized popcount for this 
microarchitecture, so
 <h2 id="show-me-the-code">Show me the code</h2>
 
 <ul>
-  <li><a 
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py";>TOPI
 bitserial convolution</a></li>
-  <li><a 
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py";>TOPI
 ARM cpu bitserial convolution</a></li>
+  <li><a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py";>TOPI
 bitserial convolution</a></li>
+  <li><a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py";>TOPI
 ARM cpu bitserial convolution</a></li>
 </ul>
 
 <h2 id="references">References</h2>
diff --git a/2019/01/19/Golang.html b/2019/01/19/Golang.html
index da4cdbd..87e1e11 100644
--- a/2019/01/19/Golang.html
+++ b/2019/01/19/Golang.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>TVM Golang Runtime for Deep Learning Deployment </h1>
       <p class="post-meta">
-        <time datetime="2019-01-19T00:00:00-08:00" itemprop="datePublished">
+        <time datetime="2019-01-19T00:00:00-05:00" itemprop="datePublished">
           Jan 19, 2019
         </time>
         
@@ -293,14 +293,14 @@ For simplicity the error handling is ignored here, but is 
important in real appl
 </code></pre></div></div>
 
 <p><code class="language-plaintext highlighter-rouge">gotvm</code> extends the 
TVM packed function system to support golang function closures as packed 
functions.
-<a href="https://github.com/dmlc/tvm/blob/master/golang/sample";>Examples</a> 
available to register golang
+<a 
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample";>Examples</a>
 available to register golang
 closure as TVM packed function and invoke the same across programming language 
barriers.</p>
 
 <h2 id="show-me-the-code">Show me the code</h2>
 
 <ul>
-  <li><a href="https://github.com/dmlc/tvm/blob/master/golang/src";>Package 
Source</a></li>
-  <li><a 
href="https://github.com/dmlc/tvm/blob/master/golang/sample";>Examples</a></li>
+  <li><a 
href="https://github.com/apache/incubator-tvm/blob/main/golang/src";>Package 
Source</a></li>
+  <li><a 
href="https://github.com/apache/incubator-tvm/blob/main/golang/sample";>Examples</a></li>
 </ul>
 
 <h2 id="references">References</h2>
diff --git a/2019/03/18/tvm-apache-announcement.html 
b/2019/03/18/tvm-apache-announcement.html
index 012b8e3..0e06763 100644
--- a/2019/03/18/tvm-apache-announcement.html
+++ b/2019/03/18/tvm-apache-announcement.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>TVM Deep Learning Compiler Joins Apache Software Foundation </h1>
       <p class="post-meta">
-        <time datetime="2019-03-18T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2019-03-18T00:00:00-04:00" itemprop="datePublished">
           Mar 18, 2019
         </time>
         
diff --git a/2019/04/29/opt-cuda-quantized.html 
b/2019/04/29/opt-cuda-quantized.html
index 8e24619..aebb4c7 100644
--- a/2019/04/29/opt-cuda-quantized.html
+++ b/2019/04/29/opt-cuda-quantized.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Automating Optimization of Quantized Deep Learning Models on CUDA 
</h1>
       <p class="post-meta">
-        <time datetime="2019-04-29T09:00:00-07:00" itemprop="datePublished">
+        <time datetime="2019-04-29T12:00:00-04:00" itemprop="datePublished">
           Apr 29, 2019
         </time>
         
@@ -219,7 +219,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and 
weight layout in OIHW4o4
 </div>
 <p></p>
 
-<p>After we have specified the layout of convolution layers, other operators 
such as <code class="language-plaintext highlighter-rouge">add</code> and 
activations can automatically adapt to the chosen layout during the <a 
href="https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc";>AlterOpLayout</a>
 pass in Relay.
+<p>After we have specified the layout of convolution layers, other operators 
such as <code class="language-plaintext highlighter-rouge">add</code> and 
activations can automatically adapt to the chosen layout during the <a 
href="https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc";>AlterOpLayout</a>
 pass in Relay.
 The layout transformation of the weight can be precomputed offline. Therefore, 
we can run the whole model in the same layout without extra overhead.</p>
 
 <h2 id="designing-search-space-for-automatic-optimization">Designing Search 
Space for Automatic Optimization</h2>
@@ -280,10 +280,10 @@ We show that automatic optimization in TVM makes it easy 
and flexible to support
 <h1 id="show-me-the-code">Show Me the Code</h1>
 <ul>
   <li><a 
href="https://github.com/vinx13/tvm-cuda-int8-benchmark";>Benchmark</a></li>
-  <li><a 
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py";>CUDA
 int8 conv2d</a></li>
-  <li><a 
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py";>CUDA
 int8 group conv2d</a></li>
-  <li><a 
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py";>CUDA
 int8 dense</a></li>
-  <li><a 
href="https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py";>Tensor
 intrinsics declaration</a></li>
+  <li><a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py";>CUDA
 int8 conv2d</a></li>
+  <li><a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py";>CUDA
 int8 group conv2d</a></li>
+  <li><a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py";>CUDA
 int8 dense</a></li>
+  <li><a 
href="https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py";>Tensor
 intrinsics declaration</a></li>
 </ul>
 
 <h1 id="bio--acknowledgement">Bio &amp; Acknowledgement</h1>
diff --git a/2019/05/30/pytorch-frontend.html b/2019/05/30/pytorch-frontend.html
index fc95ffa..4f1ba30 100644
--- a/2019/05/30/pytorch-frontend.html
+++ b/2019/05/30/pytorch-frontend.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Integrating TVM into PyTorch </h1>
       <p class="post-meta">
-        <time datetime="2019-05-30T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2019-05-30T00:00:00-04:00" itemprop="datePublished">
           May 30, 2019
         </time>
         
diff --git 
a/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html 
b/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
index 0b08a29..d4e4ec8 100644
--- a/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
+++ b/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Compiling Machine Learning to WASM and WebGPU with Apache TVM </h1>
       <p class="post-meta">
-        <time datetime="2020-05-14T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2020-05-14T00:00:00-04:00" itemprop="datePublished">
           May 14, 2020
         </time>
         
diff --git a/2020/06/04/tinyml-how-tvm-is-taming-tiny.html 
b/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
index 586ccf4..08c5e67 100644
--- a/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
+++ b/2020/06/04/tinyml-how-tvm-is-taming-tiny.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>TinyML - How TVM is Taming Tiny </h1>
       <p class="post-meta">
-        <time datetime="2020-06-04T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2020-06-04T00:00:00-04:00" itemprop="datePublished">
           Jun 4, 2020
         </time>
         
diff --git a/2020/07/14/bert-pytorch-tvm.html b/2020/07/14/bert-pytorch-tvm.html
index 2b63cf0..43cc791 100644
--- a/2020/07/14/bert-pytorch-tvm.html
+++ b/2020/07/14/bert-pytorch-tvm.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Bridging PyTorch and TVM </h1>
       <p class="post-meta">
-        <time datetime="2020-07-14T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2020-07-14T00:00:00-04:00" itemprop="datePublished">
           Jul 14, 2020
         </time>
         
diff --git a/2020/07/15/how-to-bring-your-own-codegen-to-tvm.html 
b/2020/07/15/how-to-bring-your-own-codegen-to-tvm.html
index c92704c..155ea18 100644
--- a/2020/07/15/how-to-bring-your-own-codegen-to-tvm.html
+++ b/2020/07/15/how-to-bring-your-own-codegen-to-tvm.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>How to Bring Your Own Codegen to TVM </h1>
       <p class="post-meta">
-        <time datetime="2020-07-15T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2020-07-15T00:00:00-04:00" itemprop="datePublished">
           Jul 15, 2020
         </time>
         
diff --git a/2020/09/26/bring-your-own-datatypes.html 
b/2020/09/26/bring-your-own-datatypes.html
index 22bcf89..0486f82 100644
--- a/2020/09/26/bring-your-own-datatypes.html
+++ b/2020/09/26/bring-your-own-datatypes.html
@@ -140,7 +140,7 @@
     <div class="span14 w-100">
       <h1>Bring Your Own Datatypes: Enabling Custom Datatype Exploration in 
TVM </h1>
       <p class="post-meta">
-        <time datetime="2020-09-26T00:00:00-07:00" itemprop="datePublished">
+        <time datetime="2020-09-26T00:00:00-04:00" itemprop="datePublished">
           Sep 26, 2020
         </time>
         
diff --git a/atom.xml b/atom.xml
index a68c2a3..3b07147 100644
--- a/atom.xml
+++ b/atom.xml
@@ -4,7 +4,7 @@
  <title>TVM</title>
  <link href="https://tvm.apache.org"; rel="self"/>
  <link href="https://tvm.apache.org"/>
- <updated>2020-11-02T16:31:02-08:00</updated>
+ <updated>2020-11-03T09:01:59-05:00</updated>
  <id>https://tvm.apache.org</id>
  <author>
    <name></name>
@@ -15,7 +15,7 @@
  <entry>
    <title>Bring Your Own Datatypes: Enabling Custom Datatype Exploration in 
TVM</title>
    <link href="https://tvm.apache.org/2020/09/26/bring-your-own-datatypes"/>
-   <updated>2020-09-26T00:00:00-07:00</updated>
+   <updated>2020-09-26T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2020/09/26/bring-your-own-datatypes</id>
    <content type="html">&lt;p&gt;In this post, we describe the Bring Your Own 
Datatypes framework, which enables the use of custom datatypes within 
TVM.&lt;/p&gt;
 
@@ -308,7 +308,7 @@ For more documentation about the Bring Your Own Datatypes 
framework
  <entry>
    <title>How to Bring Your Own Codegen to TVM</title>
    <link 
href="https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm"/>
-   <updated>2020-07-15T00:00:00-07:00</updated>
+   <updated>2020-07-15T00:00:00-04:00</updated>
    
<id>https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm</id>
    <content type="html">&lt;p&gt;To free data scientists from worrying about 
the performance when developing a new model, hardware backend providers (e.g., 
Intel, NVIDIA, ARM, etc) either provide kernel libraries such as cuBLAS or 
cuDNN with many commonly used deep learning kernels, or provide frameworks such 
as DNNL or TensorRT with a graph engine to let users describe their models in a 
certain way to achieve high performance. In addition, emerging deep learning 
accelerators also have t [...]
 
@@ -787,7 +787,7 @@ Figure 4: After Graph Partitioning.
  <entry>
    <title>Bridging PyTorch and TVM</title>
    <link href="https://tvm.apache.org/2020/07/14/bert-pytorch-tvm"/>
-   <updated>2020-07-14T00:00:00-07:00</updated>
+   <updated>2020-07-14T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2020/07/14/bert-pytorch-tvm</id>
    <content type="html">
 &lt;p&gt;(A more code-heavy variant is crossposted on the more PyTorch affine 
&lt;a 
href=&quot;https://lernapparat.de/transformers-pytorch-tvm/&quot;&gt;Lernapparat&lt;/a&gt;,
@@ -1310,7 +1310,7 @@ He is a PyTorch core developer and co-authored &lt;a 
href=&quot;https://www.mann
  <entry>
    <title>TinyML - How TVM is Taming Tiny</title>
    <link 
href="https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny"/>
-   <updated>2020-06-04T00:00:00-07:00</updated>
+   <updated>2020-06-04T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny</id>
    <content type="html">
 &lt;p&gt;&lt;img src=&quot;/images/microtvm/logo.png&quot; alt=&quot;microTVM 
logo&quot; width=&quot;30%&quot; /&gt;&lt;br /&gt;&lt;/p&gt;
@@ -1619,7 +1619,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix 
multiplication microkernel&lt;/
  <entry>
    <title>Compiling Machine Learning to WASM and WebGPU with Apache TVM</title>
    <link 
href="https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu"/>
-   <updated>2020-05-14T00:00:00-07:00</updated>
+   <updated>2020-05-14T00:00:00-04:00</updated>
    
<id>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</id>
    <content type="html">&lt;p&gt;&lt;strong&gt;TLDR&lt;/strong&gt;&lt;/p&gt;
 
@@ -1706,7 +1706,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix 
multiplication microkernel&lt;/
  <entry>
    <title>Integrating TVM into PyTorch</title>
    <link href="https://tvm.apache.org/2019/05/30/pytorch-frontend"/>
-   <updated>2019-05-30T00:00:00-07:00</updated>
+   <updated>2019-05-30T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2019/05/30/pytorch-frontend</id>
    <content type="html">&lt;p&gt;As TVM continuously demonstrates improvements 
to the efficiency of deep learning execution,
 it has become clear that PyTorch stands to benefit from directly leveraging 
the compiler stack.
@@ -1808,7 +1808,7 @@ relay_graph = torch_tvm.to_relay(mul, inputs)
  <entry>
    <title>Automating Optimization of Quantized Deep Learning Models on 
CUDA</title>
    <link href="https://tvm.apache.org/2019/04/29/opt-cuda-quantized"/>
-   <updated>2019-04-29T09:00:00-07:00</updated>
+   <updated>2019-04-29T12:00:00-04:00</updated>
    <id>https://tvm.apache.org/2019/04/29/opt-cuda-quantized</id>
    <content type="html">&lt;p&gt;Deep learning has been successfully applied 
to a variety of tasks.
 On real-time scenarios such as inference on autonomous vehicles, the inference 
speed of the model is critical.
@@ -1877,7 +1877,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and 
weight layout in OIHW4o4
 &lt;/div&gt;
 &lt;p&gt;&lt;/p&gt;
 
-&lt;p&gt;After we have specified the layout of convolution layers, other 
operators such as &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;add&lt;/code&gt; and activations can automatically 
adapt to the chosen layout during the &lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc&quot;&gt;AlterOpLayout&lt;/a&gt;
 pass in Relay.
+&lt;p&gt;After we have specified the layout of convolution layers, other 
operators such as &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;add&lt;/code&gt; and activations can automatically 
adapt to the chosen layout during the &lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc&quot;&gt;AlterOpLayout&lt;/a&gt;
 pass in Relay.
 The layout transformation of the weight can be precomputed offline. Therefore, 
we can run the whole model in the same layout without extra overhead.&lt;/p&gt;
 
 &lt;h2 
id=&quot;designing-search-space-for-automatic-optimization&quot;&gt;Designing 
Search Space for Automatic Optimization&lt;/h2&gt;
@@ -1938,10 +1938,10 @@ We show that automatic optimization in TVM makes it 
easy and flexible to support
 &lt;h1 id=&quot;show-me-the-code&quot;&gt;Show Me the Code&lt;/h1&gt;
 &lt;ul&gt;
   &lt;li&gt;&lt;a 
href=&quot;https://github.com/vinx13/tvm-cuda-int8-benchmark&quot;&gt;Benchmark&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py&quot;&gt;CUDA
 int8 conv2d&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py&quot;&gt;CUDA
 int8 group conv2d&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py&quot;&gt;CUDA
 int8 dense&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py&quot;&gt;Tensor
 intrinsics declaration&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py&quot;&gt;CUDA
 int8 conv2d&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py&quot;&gt;CUDA
 int8 group conv2d&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py&quot;&gt;CUDA
 int8 dense&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py&quot;&gt;Tensor
 intrinsics declaration&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h1 id=&quot;bio--acknowledgement&quot;&gt;Bio &amp;amp; 
Acknowledgement&lt;/h1&gt;
@@ -1952,7 +1952,7 @@ We show that automatic optimization in TVM makes it easy 
and flexible to support
  <entry>
    <title>TVM Deep Learning Compiler Joins Apache Software Foundation</title>
    <link href="https://tvm.apache.org/2019/03/18/tvm-apache-announcement"/>
-   <updated>2019-03-18T00:00:00-07:00</updated>
+   <updated>2019-03-18T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2019/03/18/tvm-apache-announcement</id>
    <content type="html">&lt;p&gt;There is an increasing need to bring machine 
learning to a wide diversity of hardware devices. Current frameworks rely on 
vendor-specific operator libraries and optimize for a narrow range of 
server-class GPUs. Deploying workloads to new platforms – such as mobile 
phones, embedded devices, and accelerators (e.g., FPGAs, ASICs) – requires 
significant manual effort.&lt;/p&gt;
 
@@ -1975,7 +1975,7 @@ We show that automatic optimization in TVM makes it easy 
and flexible to support
  <entry>
    <title>TVM Golang Runtime for Deep Learning Deployment</title>
    <link href="https://tvm.apache.org/2019/01/19/Golang"/>
-   <updated>2019-01-19T00:00:00-08:00</updated>
+   <updated>2019-01-19T00:00:00-05:00</updated>
    <id>https://tvm.apache.org/2019/01/19/Golang</id>
    <content type="html">&lt;h2 
id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
 
@@ -2118,14 +2118,14 @@ For simplicity the error handling is ignored here, but 
is important in real appl
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
 
 &lt;p&gt;&lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; extends the TVM packed function 
system to support golang function closures as packed functions.
-&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/sample&quot;&gt;Examples&lt;/a&gt;
 available to register golang
+&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/sample&quot;&gt;Examples&lt;/a&gt;
 available to register golang
 closure as TVM packed function and invoke the same across programming language 
barriers.&lt;/p&gt;
 
 &lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the code&lt;/h2&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/src&quot;&gt;Package 
Source&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/sample&quot;&gt;Examples&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/src&quot;&gt;Package
 Source&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/sample&quot;&gt;Examples&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
@@ -2145,7 +2145,7 @@ closure as TVM packed function and invoke the same across 
programming language b
  <entry>
    <title>Automating Generation of Low Precision Deep Learning 
Operators</title>
    <link href="https://tvm.apache.org/2018/12/18/lowprecision-conv"/>
-   <updated>2018-12-18T00:00:00-08:00</updated>
+   <updated>2018-12-18T00:00:00-05:00</updated>
    <id>https://tvm.apache.org/2018/12/18/lowprecision-conv</id>
    <content type="html">&lt;p&gt;As deep learning models grow larger and more 
complex, deploying them on low powered phone and IoT
 devices becomes challenging because of their limited compute and energy 
budgets. A  recent  trend
@@ -2287,8 +2287,8 @@ Note: x86 doesn’t support a vectorized popcount for this 
microarchitecture, so
 &lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the code&lt;/h2&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py&quot;&gt;TOPI
 bitserial convolution&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py&quot;&gt;TOPI
 ARM cpu bitserial convolution&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py&quot;&gt;TOPI
 bitserial convolution&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py&quot;&gt;TOPI
 ARM cpu bitserial convolution&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
@@ -2306,7 +2306,7 @@ Note: x86 doesn’t support a vectorized popcount for this 
microarchitecture, so
  <entry>
    <title>Efficient Privacy-Preserving ML Using TVM</title>
    <link href="https://tvm.apache.org/2018/10/09/ml-in-tees"/>
-   <updated>2018-10-09T00:00:00-07:00</updated>
+   <updated>2018-10-09T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2018/10/09/ml-in-tees</id>
    <content type="html">&lt;p&gt;This post describes Myelin, a framework for 
privacy-preserving machine learning in trusted hardware enclaves, and how TVM 
makes Myelin fast.
 The key idea is that TVM, unlike other popular ML frameworks, compiles models 
into lightweight, optimized, and dependency-free libraries which can fit into 
resource constrained enclaves.&lt;/p&gt;
@@ -2422,7 +2422,7 @@ His research interest is in the general domain of ML on 
shared private data, but
  <entry>
    <title>Automatic Kernel Optimization for Deep Learning on All Hardware 
Platforms</title>
    <link href="https://tvm.apache.org/2018/10/03/auto-opt-all"/>
-   <updated>2018-10-03T00:00:00-07:00</updated>
+   <updated>2018-10-03T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2018/10/03/auto-opt-all</id>
    <content type="html">&lt;p&gt;Optimizing the performance of deep neural 
network on a diverse range of hardware platforms is still a hard
 problem for AI developers. In terms of system support, we are facing a 
many-to-many problem here:
@@ -2816,7 +2816,7 @@ for inference deployment. TVM just provides such a 
solution.&lt;/p&gt;
  <entry>
    <title>Building a Cross-Framework Deep Learning Compiler via DLPack</title>
    <link href="https://tvm.apache.org/2018/08/10/DLPack-Bridge"/>
-   <updated>2018-08-10T00:00:00-07:00</updated>
+   <updated>2018-08-10T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2018/08/10/DLPack-Bridge</id>
    <content type="html">&lt;p&gt;Deep learning frameworks such as Tensorflow, 
PyTorch, and ApacheMxNet provide a
 powerful toolbox for quickly prototyping and deploying deep learning models.
@@ -2928,7 +2928,7 @@ found &lt;a 
href=&quot;https://tvm.apache.org/docs//tutorials/optimize/opt_gemm.
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
 
 &lt;h2 id=&quot;under-the-hood-of-the-pytorch-example&quot;&gt;Under the hood 
of the PyTorch Example&lt;/h2&gt;
-&lt;p&gt;As TVM provides &lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455&quot;&gt;functions&lt;/a&gt;
 to convert dlpack tensors to tvm &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;NDArray&lt;/code&gt;s and
+&lt;p&gt;As TVM provides &lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h#L455&quot;&gt;functions&lt;/a&gt;
 to convert dlpack tensors to tvm &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;NDArray&lt;/code&gt;s and
 vice-versa, so all that is needed is some syntactic sugar by wrapping 
functions.
 &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;convert_func&lt;/code&gt; is a generic converter for 
frameworks using tensors with dlpack
 support, and can be used to implement convenient converters, such as
@@ -2955,7 +2955,7 @@ support, and can be used to implement convenient 
converters, such as
  <entry>
    <title>VTA: An Open, Customizable Deep Learning Acceleration Stack </title>
    <link href="https://tvm.apache.org/2018/07/12/vta-release-announcement"/>
-   <updated>2018-07-12T00:00:00-07:00</updated>
+   <updated>2018-07-12T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2018/07/12/vta-release-announcement</id>
    <content type="html">&lt;p style=&quot;text-align: center&quot;&gt;Thierry 
Moreau(VTA architect), Tianqi Chen(TVM stack), Ziheng Jiang†(graph 
compilation), Luis Vega(cloud deployment)&lt;/p&gt;
 &lt;p style=&quot;text-align: center&quot;&gt;Advisors: Luis Ceze, Carlos 
Guestrin, Arvind Krishnamurthy&lt;/p&gt;
@@ -2967,7 +2967,7 @@ support, and can be used to implement convenient 
converters, such as
 
 &lt;p&gt;VTA is more than a standalone accelerator design: it’s an end-to-end 
solution that includes drivers, a JIT runtime, and an optimizing compiler stack 
based on TVM. The current release includes a behavioral hardware simulator, as 
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast 
prototyping. By extending the TVM stack with a customizable, and open source 
deep learning hardware accelerator design, we are exposing a transparent 
end-to-end deep learning stac [...]
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png&quot;
 alt=&quot;image&quot; width=&quot;50%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png&quot;
 alt=&quot;image&quot; width=&quot;50%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;The VTA and TVM stack together constitute a blueprint for end-to-end, 
accelerator-centric deep learning system that can:&lt;/p&gt;
 
@@ -3022,7 +3022,7 @@ The extendability of the compiler stack, combined with 
the ability to modify the
 &lt;p&gt;The Vanilla Tensor Accelerator (VTA) is a generic deep learning 
accelerator built around a GEMM core, which performs dense matrix 
multiplication at a high computational throughput.
 The design is inspired by mainstream deep learning accelerators, of the likes 
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide 
memory access latency and maximize utilization of compute resources. To a 
broader extent, VTA can serve as a template deep learning accelerator design, 
exposing a clean tensor computation abstraction to the compiler stack.&lt;/p&gt;
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;The figure above presents a high-level overview of the VTA hardware 
organization. VTA is composed of four modules that communicate between each 
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to 
allow for task-level pipeline parallelism.
 The compute module performs both dense linear algebra computation with its 
GEMM core, and general computation with its tensor ALU.
@@ -3039,7 +3039,7 @@ The first approach, which doesn’t require special 
hardware is to run deep lear
 This simulator back-end is readily available for developers to experiment with.
 The second approach relies on an off-the-shelf and low-cost FPGA development 
board – the &lt;a href=&quot;http://www.pynq.io/&quot;&gt;Pynq board&lt;/a&gt;, 
which exposes a reconfigurable FPGA fabric and an ARM SoC.&lt;/p&gt;
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png&quot;
 alt=&quot;image&quot; width=&quot;70%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_system.png&quot;
 alt=&quot;image&quot; width=&quot;70%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;The VTA release offers a simple compilation and deployment flow of 
the VTA hardware design and TVM workloads on the Pynq platform, with the help 
of an RPC server interface.
 The RPC server handles FPGA reconfiguration tasks and TVM module invocation 
offloading onto the VTA runtime.
@@ -3062,7 +3062,7 @@ While this platform is meant for prototyping (the 2012 
FPGA cannot compete with
 &lt;p&gt;A popular method used to assess the efficient use of hardware are 
roofline diagrams: given a hardware design, how efficiently are different 
workloads utilizing the hardware compute and memory resources. The roofline 
plot below shows the throughput achieved on different convolution layers of the 
ResNet-18 inference benchmark. Each layer has a different arithmetic intensity, 
i.e. compute to data movement ratio.
 In the left half, convolution layers are bandwidth limited, whereas on the 
right half, they are compute limited.&lt;/p&gt;
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_roofline.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;The goal behind designing a hardware architecture, and a compiler 
stack is to bring each workload as close as possible to the roofline of the 
target hardware.
 The roofline plot shows the effects of having the hardware and compiler work 
together to maximize utilization of the available hardware resources.
@@ -3071,7 +3071,7 @@ The result is an overall higher utilization of the 
available compute and memory
 
 &lt;h3 id=&quot;end-to-end-resnet-18-evaluation&quot;&gt;End to end ResNet-18 
evaluation&lt;/h3&gt;
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_e2e.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;A benefit of having a complete compiler stack built for VTA is the 
ability to run end-to-end workloads. This is compelling in the context of 
hardware acceleration because we need to understand what performance 
bottlenecks, and Amdahl limitations stand in the way to obtaining faster 
performance.
 The bar plot above shows inference performance with and without offloading the 
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s 
ARM Cortex A9 SoC.
@@ -3097,7 +3097,7 @@ This kind of high-level visibility is essential to system 
designers who want to
  <entry>
    <title>Bringing TVM into TensorFlow for Optimizing Neural Machine 
Translation on GPU</title>
    <link href="https://tvm.apache.org/2018/03/23/nmt-transformer-optimize"/>
-   <updated>2018-03-23T00:00:00-07:00</updated>
+   <updated>2018-03-23T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2018/03/23/nmt-transformer-optimize</id>
    <content type="html">&lt;h2 id=&quot;author&quot;&gt;Author&lt;/h2&gt;
 
@@ -3363,7 +3363,7 @@ C = tvm.compute(
  <entry>
    <title>Compiling Deep Learning Models to WebGL with TVM</title>
    <link href="https://tvm.apache.org/2018/03/12/webgl"/>
-   <updated>2018-03-12T00:00:00-07:00</updated>
+   <updated>2018-03-12T00:00:00-04:00</updated>
    <id>https://tvm.apache.org/2018/03/12/webgl</id>
    <content type="html">&lt;p&gt;Now TVM comes with a brand-new OpenGL/WebGL 
backend!
 This blog post explains what it is, and what you can achieve with it.&lt;/p&gt;
@@ -3479,7 +3479,7 @@ optimizations into the TVM stack.&lt;/p&gt;
  <entry>
    <title>Optimizing Mobile Deep Learning on ARM GPU with TVM</title>
    <link href="https://tvm.apache.org/2018/01/16/opt-mali-gpu"/>
-   <updated>2018-01-16T00:00:00-08:00</updated>
+   <updated>2018-01-16T00:00:00-05:00</updated>
    <id>https://tvm.apache.org/2018/01/16/opt-mali-gpu</id>
    <content type="html">&lt;p&gt;With the great success of deep learning, the 
demand for
 deploying deep neural networks to mobile devices is growing rapidly.
@@ -4053,7 +4053,7 @@ advice and &lt;a 
href=&quot;https://github.com/yzhliu&quot;&gt;Yizhi Liu&lt;/a&g
  <entry>
    <title>Remote Profile and Test Deep Learning Cross Compilation on Mobile 
Phones with TVM RPC</title>
    <link href="https://tvm.apache.org/2017/11/08/android-rpc-introduction"/>
-   <updated>2017-11-08T00:00:00-08:00</updated>
+   <updated>2017-11-08T00:00:00-05:00</updated>
    <id>https://tvm.apache.org/2017/11/08/android-rpc-introduction</id>
    <content type="html">&lt;p&gt;TVM stack is an end to end compilation stack 
to deploy deep learning workloads to all hardware backends.
 Thanks to the NNVM compiler support of TVM stack, we can now directly compile 
descriptions from deep learning frameworks and compile them to bare metal code.
@@ -4281,7 +4281,7 @@ make jvminstall
  <entry>
    <title>Bringing AMDGPUs to TVM Stack and NNVM Compiler with ROCm</title>
    <link 
href="https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm"/>
-   <updated>2017-10-30T00:00:00-07:00</updated>
+   <updated>2017-10-30T00:00:00-04:00</updated>
    
<id>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</id>
    <content type="html">&lt;p style=&quot;text-align: center&quot;&gt;Aditya 
Atluri, Advanced Micro Devices, Inc.&lt;/p&gt;
 &lt;p style=&quot;text-align: center&quot;&gt;Masahiro Masuda, Ziosoft, 
Inc.&lt;/p&gt;
@@ -4339,7 +4339,7 @@ TVM prediction top-1: 282 tiger 
cat&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
 
 &lt;h2 id=&quot;a-note-on-performance&quot;&gt;A Note on performance&lt;/h2&gt;
 
-&lt;p&gt;The current support on ROCm focuses on the functionality coverage. We 
have already seen promising performance results by simply adopting existing TVM 
schedules for CUDA backend. For example, you can try running &lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py&quot;&gt;the
 gemm test script&lt;/a&gt; in the TVM repository and see the result. For two 
types of cards we tested, the current gemm recipe for square matrix 
multiplication (not  [...]
+&lt;p&gt;The current support on ROCm focuses on the functionality coverage. We 
have already seen promising performance results by simply adopting existing TVM 
schedules for CUDA backend. For example, you can try running &lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py&quot;&gt;the
 gemm test script&lt;/a&gt; in the TVM repository and see the result. For two 
types of cards we tested, the current gemm recipe for square matrix multiplica 
[...]
 This is already a promising start, as it is very hard to optimize performance 
to get to peak and we
 did not yet apply AMD GPU specific optimizations.
 We are starting to look at performance optimization and we expect more 
improvement to come.&lt;/p&gt;
@@ -4507,7 +4507,7 @@ BB0_6:
  <entry>
    <title>NNVM Compiler: Open Compiler for AI Frameworks</title>
    <link href="https://tvm.apache.org/2017/10/06/nnvm-compiler-announcement"/>
-   <updated>2017-10-06T08:30:00-07:00</updated>
+   <updated>2017-10-06T11:30:00-04:00</updated>
    <id>https://tvm.apache.org/2017/10/06/nnvm-compiler-announcement</id>
    <content type="html">&lt;p style=&quot;text-align: center&quot;&gt;Paul G. 
Allen School of Computer Science &amp;amp; Engineering, University of 
Washington&lt;/p&gt;
 &lt;p style=&quot;text-align: center&quot;&gt;Amazon Web Service AI 
team&lt;/p&gt;
diff --git a/feed.xml b/feed.xml
index 0f56d75..2406202 100644
--- a/feed.xml
+++ b/feed.xml
@@ -1,4 +1,4 @@
-<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="4.1.1">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2020-11-02T16:31:02-08:00</updated><id>/feed.xml</id><title 
type="html">TVM</title><author><name>{&quot;name&quot;=&gt;nil}</name></author><entry><title
 type="html">Bring Your Own Datatypes: Enabling Custom Datatype [...]
+<?xml version="1.0" encoding="utf-8"?><feed 
xmlns="http://www.w3.org/2005/Atom"; ><generator uri="https://jekyllrb.com/"; 
version="4.1.1">Jekyll</generator><link href="/feed.xml" rel="self" 
type="application/atom+xml" /><link href="/" rel="alternate" type="text/html" 
/><updated>2020-11-03T09:01:59-05:00</updated><id>/feed.xml</id><title 
type="html">TVM</title><author><name>{&quot;name&quot;=&gt;nil}</name></author><entry><title
 type="html">Bring Your Own Datatypes: Enabling Custom Datatype [...]
 
 &lt;h2 id=&quot;introduction&quot;&gt;Introduction&lt;/h2&gt;
 
@@ -282,7 +282,7 @@ For more documentation about the Bring Your Own Datatypes 
framework
       &lt;p&gt;&lt;a 
href=&quot;https://posithub.org/docs/BeatingFloatingPoint.pdf&quot; 
target=&quot;_blank&quot;&gt;Beating Floating Point at its Own Game: Posit 
Arithmetic&lt;/a&gt; &lt;a href=&quot;#fnref:posit&quot; 
class=&quot;reversefootnote&quot; 
role=&quot;doc-backlink&quot;&gt;&amp;#8617;&lt;/a&gt;&lt;/p&gt;
     &lt;/li&gt;
   &lt;/ol&gt;
-&lt;/div&gt;</content><author><name>Gus Smith, Andrew 
Liu</name></author><summary type="html">In this post, we describe the Bring 
Your Own Datatypes framework, which enables the use of custom datatypes within 
TVM.</summary></entry><entry><title type="html">How to Bring Your Own Codegen 
to TVM</title><link href="/2020/07/15/how-to-bring-your-own-codegen-to-tvm" 
rel="alternate" type="text/html" title="How to Bring Your Own Codegen to TVM" 
/><published>2020-07-15T00:00:00-07:00</published>< [...]
+&lt;/div&gt;</content><author><name>Gus Smith, Andrew 
Liu</name></author><summary type="html">In this post, we describe the Bring 
Your Own Datatypes framework, which enables the use of custom datatypes within 
TVM.</summary></entry><entry><title type="html">How to Bring Your Own Codegen 
to TVM</title><link href="/2020/07/15/how-to-bring-your-own-codegen-to-tvm" 
rel="alternate" type="text/html" title="How to Bring Your Own Codegen to TVM" 
/><published>2020-07-15T00:00:00-04:00</published>< [...]
 
 &lt;p&gt;However, users have to learn a new programming interface when they 
attempt to work on a new kernel library or a device. As a result, the demand 
for a unified programming interface becomes more and more important to let all 
users and hardware backend providers stand on the same page.&lt;/p&gt;
 
@@ -751,7 +751,7 @@ Figure 4: After Graph Partitioning.
 
 &lt;h2 id=&quot;acknowledgment&quot;&gt;Acknowledgment&lt;/h2&gt;
 
-&lt;p&gt;We would like to thank our colleague Animesh Jain for valuable 
discussions in the framework design; Tianqi Chen and Jared Roesch from OctoML 
for system design discussions and prototyping; Masahiro Masuda from the TVM 
community to help code review and improve the DNNL integration. We would also 
like to thank Ramana Radhakrishnan, Matthew Barrett, Manupa Karunaratne, and 
Luke Hutton from ARM, U.K. for contributing several helpful ideas, related 
Relay passes, and the Arm Compute Li [...]
+&lt;p&gt;We would like to thank our colleague Animesh Jain for valuable 
discussions in the framework design; Tianqi Chen and Jared Roesch from OctoML 
for system design discussions and prototyping; Masahiro Masuda from the TVM 
community to help code review and improve the DNNL integration. We would also 
like to thank Ramana Radhakrishnan, Matthew Barrett, Manupa Karunaratne, and 
Luke Hutton from ARM, U.K. for contributing several helpful ideas, related 
Relay passes, and the Arm Compute Li [...]
  the Jupyter Notebook to follow along is on &lt;a 
href=&quot;https://github.com/t-vi/pytorch-tvmisc/tree/master/transformers-pytorch-tvm/&quot;&gt;github&lt;/a&gt;.)&lt;/p&gt;
 
 &lt;p&gt;Some of the most intriguing applications of Artificial Intelligence 
have been in Natural Language Processing.
@@ -1264,7 +1264,7 @@ one would want to re-do cheap computation, most 
prominently point-wise computati
 &lt;h1 id=&quot;author&quot;&gt;Author&lt;/h1&gt;
 
 &lt;p&gt;&lt;a href=&quot;https://lernapparat.de/&quot;&gt;Thomas 
Viehmann&lt;/a&gt; is the founder of &lt;a 
href=&quot;https://mathinf.eu/&quot;&gt;MathInf GmbH&lt;/a&gt;, Munich, 
Germany, a boutique training and consultancy firm focusing on Machine Learning 
and PyTorch.
-He is a PyTorch core developer and co-authored &lt;a 
href=&quot;https://www.manning.com/books/deep-learning-with-pytorch&quot;&gt;Deep
 Learning with PyTorch&lt;/a&gt;, which currently available as &lt;a 
href=&quot;https://pytorch.org/deep-learning-with-pytorch&quot;&gt;free 
download from the PyTorch 
website&lt;/a&gt;.&lt;/p&gt;</content><author><name>Thomas Viehmann, MathInf 
GmbH</name></author><summary type="html"></summary></entry><entry><title 
type="html">TinyML - How TVM is Taming Ti [...]
+He is a PyTorch core developer and co-authored &lt;a 
href=&quot;https://www.manning.com/books/deep-learning-with-pytorch&quot;&gt;Deep
 Learning with PyTorch&lt;/a&gt;, which currently available as &lt;a 
href=&quot;https://pytorch.org/deep-learning-with-pytorch&quot;&gt;free 
download from the PyTorch 
website&lt;/a&gt;.&lt;/p&gt;</content><author><name>Thomas Viehmann, MathInf 
GmbH</name></author><summary type="html"></summary></entry><entry><title 
type="html">TinyML - How TVM is Taming Ti [...]
 
 &lt;p&gt;The proliferation of low-cost, AI-powered consumer devices has led to 
widespread interest in “bare-metal” (low-power, often without an operating 
system) devices among ML researchers and practitioners.  While it is already 
possible for experts to run &lt;em&gt;some&lt;/em&gt; models on 
&lt;em&gt;some&lt;/em&gt; bare-metal devices, optimizing models for diverse 
sets of devices is challenging, often requiring manually optimized 
device-specific libraries.  And for those platforms wi [...]
 
@@ -1563,7 +1563,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix 
multiplication microkernel&lt;/
   &lt;li&gt;&lt;a 
href=&quot;https://homes.cs.washington.edu/~moreau/&quot;&gt;Thierry 
Moreau&lt;/a&gt;, for mentoring me during my time at OctoML.&lt;/li&gt;
   &lt;li&gt;&lt;a 
href=&quot;https://homes.cs.washington.edu/~vegaluis/&quot;&gt;Luis 
Vega&lt;/a&gt;, for teaching me the fundamentals of interacting with 
microcontrollers.&lt;/li&gt;
   &lt;li&gt;&lt;a 
href=&quot;https://www.linkedin.com/in/themadrasi/?originalSubdomain=uk&quot;&gt;Ramana
 Radhakrishnan&lt;/a&gt;, for supplying the Arm hardware used in our 
experiments and for providing guidance on its usage.&lt;/li&gt;
-&lt;/ul&gt;</content><author><name>Logan Weber and Andrew Reusch, 
OctoML</name></author><summary type="html"></summary></entry><entry><title 
type="html">Compiling Machine Learning to WASM and WebGPU with Apache 
TVM</title><link 
href="/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu" 
rel="alternate" type="text/html" title="Compiling Machine Learning to WASM and 
WebGPU with Apache TVM" 
/><published>2020-05-14T00:00:00-07:00</published><updated>2020-05-14T00:00:00-07:00</upd
 [...]
+&lt;/ul&gt;</content><author><name>Logan Weber and Andrew Reusch, 
OctoML</name></author><summary type="html"></summary></entry><entry><title 
type="html">Compiling Machine Learning to WASM and WebGPU with Apache 
TVM</title><link 
href="/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu" 
rel="alternate" type="text/html" title="Compiling Machine Learning to WASM and 
WebGPU with Apache TVM" 
/><published>2020-05-14T00:00:00-04:00</published><updated>2020-05-14T00:00:00-04:00</upd
 [...]
 
 &lt;p&gt;We introduced support for WASM and WebGPU to the Apache TVM deep 
learning compiler. Our experiments shows that  TVM’s WebGPU backend can get 
&lt;strong&gt;close to native&lt;/strong&gt; &lt;strong&gt;GPU 
performance&lt;/strong&gt; when deploying models to the web.&lt;/p&gt;
 
@@ -1641,7 +1641,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix 
multiplication microkernel&lt;/
 
 &lt;h2 id=&quot;acknowledgement&quot;&gt;Acknowledgement&lt;/h2&gt;
 
-&lt;p&gt;We would like to thank the emscripten project for providing the WASM 
compilation infrastructures as well as the JS library support on the web. We 
would also like to thank the WebGPU community for various helpful discussions. 
Thanks to Fletcher Haynes for valuable feedbacks to the 
post.&lt;/p&gt;</content><author><name>Tianqi Chen and Jared Roesch, 
OctoML</name></author><summary type="html">TLDR</summary></entry><entry><title 
type="html">Integrating TVM into PyTorch</title><link  [...]
+&lt;p&gt;We would like to thank the emscripten project for providing the WASM 
compilation infrastructures as well as the JS library support on the web. We 
would also like to thank the WebGPU community for various helpful discussions. 
Thanks to Fletcher Haynes for valuable feedbacks to the 
post.&lt;/p&gt;</content><author><name>Tianqi Chen and Jared Roesch, 
OctoML</name></author><summary type="html">TLDR</summary></entry><entry><title 
type="html">Integrating TVM into PyTorch</title><link  [...]
 it has become clear that PyTorch stands to benefit from directly leveraging 
the compiler stack.
 A major tenet of PyTorch is providing seamless and robust integrations that 
don’t get in the user’s way.
 To that end, PyTorch now has an official TVM-based backend, &lt;a 
href=&quot;https://github.com/pytorch/tvm&quot;&gt;torch_tvm&lt;/a&gt;.&lt;/p&gt;
@@ -1733,7 +1733,7 @@ def mul(a, b, c):
 
 # via script
 relay_graph = torch_tvm.to_relay(mul, inputs)
-&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>Bram 
Wasti</name></author><summary type="html">As TVM continuously demonstrates 
improvements to the efficiency of deep learning execution, it has become clear 
that PyTorch stands to benefit from directly leveraging the compiler stack. A 
major tenet of PyTorch is providing seamless and robust integrations that don’t 
get in the user’s way. To that end, PyTorch now has an official TVM-based 
backend, torch_tvm.</summary [...]
+&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;</content><author><name>Bram 
Wasti</name></author><summary type="html">As TVM continuously demonstrates 
improvements to the efficiency of deep learning execution, it has become clear 
that PyTorch stands to benefit from directly leveraging the compiler stack. A 
major tenet of PyTorch is providing seamless and robust integrations that don’t 
get in the user’s way. To that end, PyTorch now has an official TVM-based 
backend, torch_tvm.</summary [...]
 On real-time scenarios such as inference on autonomous vehicles, the inference 
speed of the model is critical.
 Network quantization is an effective approach to accelerating deep learning 
models.
 In quantized models, both data and model parameters are represented with low 
precision data types such as &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;int8&lt;/code&gt; and &lt;code 
class=&quot;language-plaintext highlighter-rouge&quot;&gt;float16&lt;/code&gt;.
@@ -1800,7 +1800,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and 
weight layout in OIHW4o4
 &lt;/div&gt;
 &lt;p&gt;&lt;/p&gt;
 
-&lt;p&gt;After we have specified the layout of convolution layers, other 
operators such as &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;add&lt;/code&gt; and activations can automatically 
adapt to the chosen layout during the &lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc&quot;&gt;AlterOpLayout&lt;/a&gt;
 pass in Relay.
+&lt;p&gt;After we have specified the layout of convolution layers, other 
operators such as &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;add&lt;/code&gt; and activations can automatically 
adapt to the chosen layout during the &lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc&quot;&gt;AlterOpLayout&lt;/a&gt;
 pass in Relay.
 The layout transformation of the weight can be precomputed offline. Therefore, 
we can run the whole model in the same layout without extra overhead.&lt;/p&gt;
 
 &lt;h2 
id=&quot;designing-search-space-for-automatic-optimization&quot;&gt;Designing 
Search Space for Automatic Optimization&lt;/h2&gt;
@@ -1861,14 +1861,14 @@ We show that automatic optimization in TVM makes it 
easy and flexible to support
 &lt;h1 id=&quot;show-me-the-code&quot;&gt;Show Me the Code&lt;/h1&gt;
 &lt;ul&gt;
   &lt;li&gt;&lt;a 
href=&quot;https://github.com/vinx13/tvm-cuda-int8-benchmark&quot;&gt;Benchmark&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py&quot;&gt;CUDA
 int8 conv2d&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py&quot;&gt;CUDA
 int8 group conv2d&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py&quot;&gt;CUDA
 int8 dense&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py&quot;&gt;Tensor
 intrinsics declaration&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py&quot;&gt;CUDA
 int8 conv2d&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py&quot;&gt;CUDA
 int8 group conv2d&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py&quot;&gt;CUDA
 int8 dense&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py&quot;&gt;Tensor
 intrinsics declaration&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h1 id=&quot;bio--acknowledgement&quot;&gt;Bio &amp;amp; 
Acknowledgement&lt;/h1&gt;
-&lt;p&gt;&lt;a href=&quot;https://wuwei.io/&quot;&gt;Wuwei Lin&lt;/a&gt; is an 
undergraduate student at SJTU. He is currently an intern at TuSimple. The 
author has many thanks to &lt;a 
href=&quot;https://homes.cs.washington.edu/~tqchen/&quot;&gt;Tianqi 
Chen&lt;/a&gt; and &lt;a 
href=&quot;https://homes.cs.washington.edu/~eqy/&quot;&gt;Eddie Yan&lt;/a&gt; 
for their reviews.&lt;/p&gt;</content><author><name>Wuwei 
Lin</name></author><summary type="html">Deep learning has been successfully ap 
[...]
+&lt;p&gt;&lt;a href=&quot;https://wuwei.io/&quot;&gt;Wuwei Lin&lt;/a&gt; is an 
undergraduate student at SJTU. He is currently an intern at TuSimple. The 
author has many thanks to &lt;a 
href=&quot;https://homes.cs.washington.edu/~tqchen/&quot;&gt;Tianqi 
Chen&lt;/a&gt; and &lt;a 
href=&quot;https://homes.cs.washington.edu/~eqy/&quot;&gt;Eddie Yan&lt;/a&gt; 
for their reviews.&lt;/p&gt;</content><author><name>Wuwei 
Lin</name></author><summary type="html">Deep learning has been successfully ap 
[...]
 
 &lt;p&gt;TVM is an open source deep learning compiler stack that closes the 
gap between the productivity-focused deep learning frameworks, and the 
performance- or efficiency-oriented hardware backends. Today, we are glad to 
announce that the TVM community has decided to move on to Apache incubator, and 
becomes an Apache(incubating) project.&lt;/p&gt;
 
@@ -1882,7 +1882,7 @@ We show that automatic optimization in TVM makes it easy 
and flexible to support
 
 &lt;p&gt;We would like to take this chance to thank the Allen School for 
supporting the SAMPL team that gave birth to the TVM project. We would also 
like to thank the Halide project which provided the basis for TVM’s loop-level 
IR and initial code generation. We would like to thank our Apache incubator 
mentors for introducing the project to Apache and providing useful guidance. 
Finally, we would like to thank the TVM community and all of the organizations, 
as listed above, that supported [...]
 
-&lt;p&gt;See also the &lt;a 
href=&quot;https://news.cs.washington.edu/2019/03/18/allen-schools-tvm-deep-learning-compiler-framework-transitions-to-apache/&quot;&gt;Allen
 School news about the transition here&lt;/a&gt;, &lt;a 
href=&quot;https://sampl.cs.washington.edu/tvmconf/#about-tvmconf&quot;&gt;TVM 
conference program slides and recordings&lt;/a&gt;, and &lt;a 
href=&quot;https://tvm.apache.org/docs//contribute/community.html&quot;&gt;our 
community guideline here&lt;/a&gt;. Follow us o [...]
+&lt;p&gt;See also the &lt;a 
href=&quot;https://news.cs.washington.edu/2019/03/18/allen-schools-tvm-deep-learning-compiler-framework-transitions-to-apache/&quot;&gt;Allen
 School news about the transition here&lt;/a&gt;, &lt;a 
href=&quot;https://sampl.cs.washington.edu/tvmconf/#about-tvmconf&quot;&gt;TVM 
conference program slides and recordings&lt;/a&gt;, and &lt;a 
href=&quot;https://tvm.apache.org/docs//contribute/community.html&quot;&gt;our 
community guideline here&lt;/a&gt;. Follow us o [...]
 
 &lt;p&gt;TVM is an open deep learning compiler stack to compile various deep 
learning models from different
 frameworks to CPU, GPU or specialized accelerators.  TVM supports model 
compilation from a wide range
@@ -2023,14 +2023,14 @@ For simplicity the error handling is ignored here, but 
is important in real appl
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
 
 &lt;p&gt;&lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; extends the TVM packed function 
system to support golang function closures as packed functions.
-&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/sample&quot;&gt;Examples&lt;/a&gt;
 available to register golang
+&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/sample&quot;&gt;Examples&lt;/a&gt;
 available to register golang
 closure as TVM packed function and invoke the same across programming language 
barriers.&lt;/p&gt;
 
 &lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the code&lt;/h2&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/src&quot;&gt;Package 
Source&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/sample&quot;&gt;Examples&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/src&quot;&gt;Package
 Source&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/sample&quot;&gt;Examples&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
@@ -2043,7 +2043,7 @@ closure as TVM packed function and invoke the same across 
programming language b
   &lt;li&gt;[5] &lt;a 
href=&quot;https://blog.learngoprogramming.com/golang-variadic-funcs-how-to-patterns-369408f19085&quot;&gt;Go
 Variadic Functions&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;[6] &lt;a 
href=&quot;https://github.com/jdeng/gomxnet&quot;&gt;CFFI 
Ref&lt;/a&gt;&lt;/li&gt;
   &lt;li&gt;[7] &lt;a 
href=&quot;https://golang.org/pkg/runtime/#SetFinalizer&quot;&gt;Go 
Finalizers&lt;/a&gt;&lt;/li&gt;
-&lt;/ul&gt;</content><author><name>Siva</name></author><summary 
type="html">Introduction</summary></entry><entry><title type="html">Automating 
Generation of Low Precision Deep Learning Operators</title><link 
href="/2018/12/18/lowprecision-conv" rel="alternate" type="text/html" 
title="Automating Generation of Low Precision Deep Learning Operators" 
/><published>2018-12-18T00:00:00-08:00</published><updated>2018-12-18T00:00:00-08:00</updated><id>/2018/12/18/lowprecision-conv</id><content
 ty [...]
+&lt;/ul&gt;</content><author><name>Siva</name></author><summary 
type="html">Introduction</summary></entry><entry><title type="html">Automating 
Generation of Low Precision Deep Learning Operators</title><link 
href="/2018/12/18/lowprecision-conv" rel="alternate" type="text/html" 
title="Automating Generation of Low Precision Deep Learning Operators" 
/><published>2018-12-18T00:00:00-05:00</published><updated>2018-12-18T00:00:00-05:00</updated><id>/2018/12/18/lowprecision-conv</id><content
 ty [...]
 devices becomes challenging because of their limited compute and energy 
budgets. A  recent  trend
  in  deep  learning  is  the  use  of  extremely  quantized  models  that 
operate  on  inputs  and
  weights  of  a  few  bits, with networks like XNOR-Net, DoReFa-Net, and 
HWGQ-Net making steady
@@ -2183,8 +2183,8 @@ Note: x86 doesn’t support a vectorized popcount for this 
microarchitecture, so
 &lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the code&lt;/h2&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py&quot;&gt;TOPI
 bitserial convolution&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py&quot;&gt;TOPI
 ARM cpu bitserial convolution&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py&quot;&gt;TOPI
 bitserial convolution&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py&quot;&gt;TOPI
 ARM cpu bitserial convolution&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
diff --git a/rss.xml b/rss.xml
index f44a1bc..cc2324e 100644
--- a/rss.xml
+++ b/rss.xml
@@ -5,8 +5,8 @@
         <description>TVM - </description>
         <link>https://tvm.apache.org</link>
         <atom:link href="https://tvm.apache.org"; rel="self" 
type="application/rss+xml" />
-        <lastBuildDate>Mon, 02 Nov 2020 16:31:02 -0800</lastBuildDate>
-        <pubDate>Mon, 02 Nov 2020 16:31:02 -0800</pubDate>
+        <lastBuildDate>Tue, 03 Nov 2020 09:01:59 -0500</lastBuildDate>
+        <pubDate>Tue, 03 Nov 2020 09:01:59 -0500</pubDate>
         <ttl>60</ttl>
 
 
@@ -300,7 +300,7 @@ For more documentation about the Bring Your Own Datatypes 
framework
 </description>
                 
<link>https://tvm.apache.org/2020/09/26/bring-your-own-datatypes</link>
                 
<guid>https://tvm.apache.org/2020/09/26/bring-your-own-datatypes</guid>
-                <pubDate>Sat, 26 Sep 2020 00:00:00 -0700</pubDate>
+                <pubDate>Sat, 26 Sep 2020 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -779,7 +779,7 @@ Figure 4: After Graph Partitioning.
 </description>
                 
<link>https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm</link>
                 
<guid>https://tvm.apache.org/2020/07/15/how-to-bring-your-own-codegen-to-tvm</guid>
-                <pubDate>Wed, 15 Jul 2020 00:00:00 -0700</pubDate>
+                <pubDate>Wed, 15 Jul 2020 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -1302,7 +1302,7 @@ He is a PyTorch core developer and co-authored &lt;a 
href=&quot;https://www.mann
 </description>
                 <link>https://tvm.apache.org/2020/07/14/bert-pytorch-tvm</link>
                 <guid>https://tvm.apache.org/2020/07/14/bert-pytorch-tvm</guid>
-                <pubDate>Tue, 14 Jul 2020 00:00:00 -0700</pubDate>
+                <pubDate>Tue, 14 Jul 2020 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -1611,7 +1611,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix 
multiplication microkernel&lt;/
 </description>
                 
<link>https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny</link>
                 
<guid>https://tvm.apache.org/2020/06/04/tinyml-how-tvm-is-taming-tiny</guid>
-                <pubDate>Thu, 04 Jun 2020 00:00:00 -0700</pubDate>
+                <pubDate>Thu, 04 Jun 2020 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -1698,7 +1698,7 @@ Diagram from CMSIS-NN paper showing a 2x2 matrix 
multiplication microkernel&lt;/
 </description>
                 
<link>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</link>
                 
<guid>https://tvm.apache.org/2020/05/14/compiling-machine-learning-to-webassembly-and-webgpu</guid>
-                <pubDate>Thu, 14 May 2020 00:00:00 -0700</pubDate>
+                <pubDate>Thu, 14 May 2020 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -1800,7 +1800,7 @@ relay_graph = torch_tvm.to_relay(mul, inputs)
 </description>
                 <link>https://tvm.apache.org/2019/05/30/pytorch-frontend</link>
                 <guid>https://tvm.apache.org/2019/05/30/pytorch-frontend</guid>
-                <pubDate>Thu, 30 May 2019 00:00:00 -0700</pubDate>
+                <pubDate>Thu, 30 May 2019 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -1872,7 +1872,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and 
weight layout in OIHW4o4
 &lt;/div&gt;
 &lt;p&gt;&lt;/p&gt;
 
-&lt;p&gt;After we have specified the layout of convolution layers, other 
operators such as &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;add&lt;/code&gt; and activations can automatically 
adapt to the chosen layout during the &lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc&quot;&gt;AlterOpLayout&lt;/a&gt;
 pass in Relay.
+&lt;p&gt;After we have specified the layout of convolution layers, other 
operators such as &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;add&lt;/code&gt; and activations can automatically 
adapt to the chosen layout during the &lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc&quot;&gt;AlterOpLayout&lt;/a&gt;
 pass in Relay.
 The layout transformation of the weight can be precomputed offline. Therefore, 
we can run the whole model in the same layout without extra overhead.&lt;/p&gt;
 
 &lt;h2 
id=&quot;designing-search-space-for-automatic-optimization&quot;&gt;Designing 
Search Space for Automatic Optimization&lt;/h2&gt;
@@ -1933,10 +1933,10 @@ We show that automatic optimization in TVM makes it 
easy and flexible to support
 &lt;h1 id=&quot;show-me-the-code&quot;&gt;Show Me the Code&lt;/h1&gt;
 &lt;ul&gt;
   &lt;li&gt;&lt;a 
href=&quot;https://github.com/vinx13/tvm-cuda-int8-benchmark&quot;&gt;Benchmark&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py&quot;&gt;CUDA
 int8 conv2d&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py&quot;&gt;CUDA
 int8 group conv2d&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py&quot;&gt;CUDA
 int8 dense&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py&quot;&gt;Tensor
 intrinsics declaration&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py&quot;&gt;CUDA
 int8 conv2d&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py&quot;&gt;CUDA
 int8 group conv2d&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py&quot;&gt;CUDA
 int8 dense&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py&quot;&gt;Tensor
 intrinsics declaration&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h1 id=&quot;bio--acknowledgement&quot;&gt;Bio &amp;amp; 
Acknowledgement&lt;/h1&gt;
@@ -1944,7 +1944,7 @@ We show that automatic optimization in TVM makes it easy 
and flexible to support
 </description>
                 
<link>https://tvm.apache.org/2019/04/29/opt-cuda-quantized</link>
                 
<guid>https://tvm.apache.org/2019/04/29/opt-cuda-quantized</guid>
-                <pubDate>Mon, 29 Apr 2019 09:00:00 -0700</pubDate>
+                <pubDate>Mon, 29 Apr 2019 12:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -1967,7 +1967,7 @@ We show that automatic optimization in TVM makes it easy 
and flexible to support
 </description>
                 
<link>https://tvm.apache.org/2019/03/18/tvm-apache-announcement</link>
                 
<guid>https://tvm.apache.org/2019/03/18/tvm-apache-announcement</guid>
-                <pubDate>Mon, 18 Mar 2019 00:00:00 -0700</pubDate>
+                <pubDate>Mon, 18 Mar 2019 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -2113,14 +2113,14 @@ For simplicity the error handling is ignored here, but 
is important in real appl
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
 
 &lt;p&gt;&lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;gotvm&lt;/code&gt; extends the TVM packed function 
system to support golang function closures as packed functions.
-&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/sample&quot;&gt;Examples&lt;/a&gt;
 available to register golang
+&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/sample&quot;&gt;Examples&lt;/a&gt;
 available to register golang
 closure as TVM packed function and invoke the same across programming language 
barriers.&lt;/p&gt;
 
 &lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the code&lt;/h2&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/src&quot;&gt;Package 
Source&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/golang/sample&quot;&gt;Examples&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/src&quot;&gt;Package
 Source&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/golang/sample&quot;&gt;Examples&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
@@ -2137,7 +2137,7 @@ closure as TVM packed function and invoke the same across 
programming language b
 </description>
                 <link>https://tvm.apache.org/2019/01/19/Golang</link>
                 <guid>https://tvm.apache.org/2019/01/19/Golang</guid>
-                <pubDate>Sat, 19 Jan 2019 00:00:00 -0800</pubDate>
+                <pubDate>Sat, 19 Jan 2019 00:00:00 -0500</pubDate>
         </item>
 
         <item>
@@ -2282,8 +2282,8 @@ Note: x86 doesn’t support a vectorized popcount for this 
microarchitecture, so
 &lt;h2 id=&quot;show-me-the-code&quot;&gt;Show me the code&lt;/h2&gt;
 
 &lt;ul&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py&quot;&gt;TOPI
 bitserial convolution&lt;/a&gt;&lt;/li&gt;
-  &lt;li&gt;&lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py&quot;&gt;TOPI
 ARM cpu bitserial convolution&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py&quot;&gt;TOPI
 bitserial convolution&lt;/a&gt;&lt;/li&gt;
+  &lt;li&gt;&lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py&quot;&gt;TOPI
 ARM cpu bitserial convolution&lt;/a&gt;&lt;/li&gt;
 &lt;/ul&gt;
 
 &lt;h2 id=&quot;references&quot;&gt;References&lt;/h2&gt;
@@ -2298,7 +2298,7 @@ Note: x86 doesn’t support a vectorized popcount for this 
microarchitecture, so
 </description>
                 
<link>https://tvm.apache.org/2018/12/18/lowprecision-conv</link>
                 
<guid>https://tvm.apache.org/2018/12/18/lowprecision-conv</guid>
-                <pubDate>Tue, 18 Dec 2018 00:00:00 -0800</pubDate>
+                <pubDate>Tue, 18 Dec 2018 00:00:00 -0500</pubDate>
         </item>
 
         <item>
@@ -2414,7 +2414,7 @@ His research interest is in the general domain of ML on 
shared private data, but
 </description>
                 <link>https://tvm.apache.org/2018/10/09/ml-in-tees</link>
                 <guid>https://tvm.apache.org/2018/10/09/ml-in-tees</guid>
-                <pubDate>Tue, 09 Oct 2018 00:00:00 -0700</pubDate>
+                <pubDate>Tue, 09 Oct 2018 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -2808,7 +2808,7 @@ for inference deployment. TVM just provides such a 
solution.&lt;/p&gt;
 </description>
                 <link>https://tvm.apache.org/2018/10/03/auto-opt-all</link>
                 <guid>https://tvm.apache.org/2018/10/03/auto-opt-all</guid>
-                <pubDate>Wed, 03 Oct 2018 00:00:00 -0700</pubDate>
+                <pubDate>Wed, 03 Oct 2018 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -2923,7 +2923,7 @@ found &lt;a 
href=&quot;https://tvm.apache.org/docs//tutorials/optimize/opt_gemm.
 &lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
 
 &lt;h2 id=&quot;under-the-hood-of-the-pytorch-example&quot;&gt;Under the hood 
of the PyTorch Example&lt;/h2&gt;
-&lt;p&gt;As TVM provides &lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455&quot;&gt;functions&lt;/a&gt;
 to convert dlpack tensors to tvm &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;NDArray&lt;/code&gt;s and
+&lt;p&gt;As TVM provides &lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h#L455&quot;&gt;functions&lt;/a&gt;
 to convert dlpack tensors to tvm &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;NDArray&lt;/code&gt;s and
 vice-versa, so all that is needed is some syntactic sugar by wrapping 
functions.
 &lt;code class=&quot;language-plaintext 
highlighter-rouge&quot;&gt;convert_func&lt;/code&gt; is a generic converter for 
frameworks using tensors with dlpack
 support, and can be used to implement convenient converters, such as
@@ -2947,7 +2947,7 @@ support, and can be used to implement convenient 
converters, such as
 </description>
                 <link>https://tvm.apache.org/2018/08/10/DLPack-Bridge</link>
                 <guid>https://tvm.apache.org/2018/08/10/DLPack-Bridge</guid>
-                <pubDate>Fri, 10 Aug 2018 00:00:00 -0700</pubDate>
+                <pubDate>Fri, 10 Aug 2018 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -2962,7 +2962,7 @@ support, and can be used to implement convenient 
converters, such as
 
 &lt;p&gt;VTA is more than a standalone accelerator design: it’s an end-to-end 
solution that includes drivers, a JIT runtime, and an optimizing compiler stack 
based on TVM. The current release includes a behavioral hardware simulator, as 
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast 
prototyping. By extending the TVM stack with a customizable, and open source 
deep learning hardware accelerator design, we are exposing a transparent 
end-to-end deep learning stac [...]
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png&quot;
 alt=&quot;image&quot; width=&quot;50%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png&quot;
 alt=&quot;image&quot; width=&quot;50%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;The VTA and TVM stack together constitute a blueprint for end-to-end, 
accelerator-centric deep learning system that can:&lt;/p&gt;
 
@@ -3017,7 +3017,7 @@ The extendability of the compiler stack, combined with 
the ability to modify the
 &lt;p&gt;The Vanilla Tensor Accelerator (VTA) is a generic deep learning 
accelerator built around a GEMM core, which performs dense matrix 
multiplication at a high computational throughput.
 The design is inspired by mainstream deep learning accelerators, of the likes 
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide 
memory access latency and maximize utilization of compute resources. To a 
broader extent, VTA can serve as a template deep learning accelerator design, 
exposing a clean tensor computation abstraction to the compiler stack.&lt;/p&gt;
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;The figure above presents a high-level overview of the VTA hardware 
organization. VTA is composed of four modules that communicate between each 
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to 
allow for task-level pipeline parallelism.
 The compute module performs both dense linear algebra computation with its 
GEMM core, and general computation with its tensor ALU.
@@ -3034,7 +3034,7 @@ The first approach, which doesn’t require special 
hardware is to run deep lear
 This simulator back-end is readily available for developers to experiment with.
 The second approach relies on an off-the-shelf and low-cost FPGA development 
board – the &lt;a href=&quot;http://www.pynq.io/&quot;&gt;Pynq board&lt;/a&gt;, 
which exposes a reconfigurable FPGA fabric and an ARM SoC.&lt;/p&gt;
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png&quot;
 alt=&quot;image&quot; width=&quot;70%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_system.png&quot;
 alt=&quot;image&quot; width=&quot;70%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;The VTA release offers a simple compilation and deployment flow of 
the VTA hardware design and TVM workloads on the Pynq platform, with the help 
of an RPC server interface.
 The RPC server handles FPGA reconfiguration tasks and TVM module invocation 
offloading onto the VTA runtime.
@@ -3057,7 +3057,7 @@ While this platform is meant for prototyping (the 2012 
FPGA cannot compete with
 &lt;p&gt;A popular method used to assess the efficient use of hardware are 
roofline diagrams: given a hardware design, how efficiently are different 
workloads utilizing the hardware compute and memory resources. The roofline 
plot below shows the throughput achieved on different convolution layers of the 
ResNet-18 inference benchmark. Each layer has a different arithmetic intensity, 
i.e. compute to data movement ratio.
 In the left half, convolution layers are bandwidth limited, whereas on the 
right half, they are compute limited.&lt;/p&gt;
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_roofline.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;The goal behind designing a hardware architecture, and a compiler 
stack is to bring each workload as close as possible to the roofline of the 
target hardware.
 The roofline plot shows the effects of having the hardware and compiler work 
together to maximize utilization of the available hardware resources.
@@ -3066,7 +3066,7 @@ The result is an overall higher utilization of the 
available compute and memory
 
 &lt;h3 id=&quot;end-to-end-resnet-18-evaluation&quot;&gt;End to end ResNet-18 
evaluation&lt;/h3&gt;
 
-&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
+&lt;p style=&quot;text-align: center&quot;&gt;&lt;img 
src=&quot;https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_e2e.png&quot;
 alt=&quot;image&quot; width=&quot;60%&quot; /&gt;&lt;/p&gt;
 
 &lt;p&gt;A benefit of having a complete compiler stack built for VTA is the 
ability to run end-to-end workloads. This is compelling in the context of 
hardware acceleration because we need to understand what performance 
bottlenecks, and Amdahl limitations stand in the way to obtaining faster 
performance.
 The bar plot above shows inference performance with and without offloading the 
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s 
ARM Cortex A9 SoC.
@@ -3089,7 +3089,7 @@ This kind of high-level visibility is essential to system 
designers who want to
 </description>
                 
<link>https://tvm.apache.org/2018/07/12/vta-release-announcement</link>
                 
<guid>https://tvm.apache.org/2018/07/12/vta-release-announcement</guid>
-                <pubDate>Thu, 12 Jul 2018 00:00:00 -0700</pubDate>
+                <pubDate>Thu, 12 Jul 2018 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -3355,7 +3355,7 @@ C = tvm.compute(
 </description>
                 
<link>https://tvm.apache.org/2018/03/23/nmt-transformer-optimize</link>
                 
<guid>https://tvm.apache.org/2018/03/23/nmt-transformer-optimize</guid>
-                <pubDate>Fri, 23 Mar 2018 00:00:00 -0700</pubDate>
+                <pubDate>Fri, 23 Mar 2018 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -3471,7 +3471,7 @@ optimizations into the TVM stack.&lt;/p&gt;
 </description>
                 <link>https://tvm.apache.org/2018/03/12/webgl</link>
                 <guid>https://tvm.apache.org/2018/03/12/webgl</guid>
-                <pubDate>Mon, 12 Mar 2018 00:00:00 -0700</pubDate>
+                <pubDate>Mon, 12 Mar 2018 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -4045,7 +4045,7 @@ advice and &lt;a 
href=&quot;https://github.com/yzhliu&quot;&gt;Yizhi Liu&lt;/a&g
 </description>
                 <link>https://tvm.apache.org/2018/01/16/opt-mali-gpu</link>
                 <guid>https://tvm.apache.org/2018/01/16/opt-mali-gpu</guid>
-                <pubDate>Tue, 16 Jan 2018 00:00:00 -0800</pubDate>
+                <pubDate>Tue, 16 Jan 2018 00:00:00 -0500</pubDate>
         </item>
 
         <item>
@@ -4273,7 +4273,7 @@ make jvminstall
 </description>
                 
<link>https://tvm.apache.org/2017/11/08/android-rpc-introduction</link>
                 
<guid>https://tvm.apache.org/2017/11/08/android-rpc-introduction</guid>
-                <pubDate>Wed, 08 Nov 2017 00:00:00 -0800</pubDate>
+                <pubDate>Wed, 08 Nov 2017 00:00:00 -0500</pubDate>
         </item>
 
         <item>
@@ -4334,7 +4334,7 @@ TVM prediction top-1: 282 tiger 
cat&lt;/code&gt;&lt;/pre&gt;&lt;/figure&gt;
 
 &lt;h2 id=&quot;a-note-on-performance&quot;&gt;A Note on performance&lt;/h2&gt;
 
-&lt;p&gt;The current support on ROCm focuses on the functionality coverage. We 
have already seen promising performance results by simply adopting existing TVM 
schedules for CUDA backend. For example, you can try running &lt;a 
href=&quot;https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py&quot;&gt;the
 gemm test script&lt;/a&gt; in the TVM repository and see the result. For two 
types of cards we tested, the current gemm recipe for square matrix 
multiplication (not  [...]
+&lt;p&gt;The current support on ROCm focuses on the functionality coverage. We 
have already seen promising performance results by simply adopting existing TVM 
schedules for CUDA backend. For example, you can try running &lt;a 
href=&quot;https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py&quot;&gt;the
 gemm test script&lt;/a&gt; in the TVM repository and see the result. For two 
types of cards we tested, the current gemm recipe for square matrix multiplica 
[...]
 This is already a promising start, as it is very hard to optimize performance 
to get to peak and we
 did not yet apply AMD GPU specific optimizations.
 We are starting to look at performance optimization and we expect more 
improvement to come.&lt;/p&gt;
@@ -4499,7 +4499,7 @@ BB0_6:
 </description>
                 
<link>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</link>
                 
<guid>https://tvm.apache.org/2017/10/30/Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm</guid>
-                <pubDate>Mon, 30 Oct 2017 00:00:00 -0700</pubDate>
+                <pubDate>Mon, 30 Oct 2017 00:00:00 -0400</pubDate>
         </item>
 
         <item>
@@ -4582,7 +4582,7 @@ We also learns from Halide when implementing the lowering 
pipeline in TVM.&lt;/l
 </description>
                 
<link>https://tvm.apache.org/2017/10/06/nnvm-compiler-announcement</link>
                 
<guid>https://tvm.apache.org/2017/10/06/nnvm-compiler-announcement</guid>
-                <pubDate>Fri, 06 Oct 2017 08:30:00 -0700</pubDate>
+                <pubDate>Fri, 06 Oct 2017 11:30:00 -0400</pubDate>
         </item>
 
 

Reply via email to