This is an automated email from the ASF dual-hosted git repository.
tqchen pushed a commit to branch asf-site
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-site.git
The following commit(s) were added to refs/heads/asf-site by this push:
new c506653 Build at Mon Mar 30 15:47:27 PDT 2020
c506653 is described below
commit c506653d9088268817ed126e709cd084331b1516
Author: tqchen <[email protected]>
AuthorDate: Mon Mar 30 15:47:27 2020 -0700
Build at Mon Mar 30 15:47:27 PDT 2020
---
2018/07/12/vta-release-announcement.html | 10 +++++-----
2019/03/18/tvm-apache-announcement.html | 2 +-
atom.xml | 14 +++++++-------
rss.xml | 16 ++++++++--------
vta.html | 4 ++--
5 files changed, 23 insertions(+), 23 deletions(-)
diff --git a/2018/07/12/vta-release-announcement.html
b/2018/07/12/vta-release-announcement.html
index 9304549..08c2b6e 100644
--- a/2018/07/12/vta-release-announcement.html
+++ b/2018/07/12/vta-release-announcement.html
@@ -168,7 +168,7 @@
<p>VTA is more than a standalone accelerator design: it’s an end-to-end
solution that includes drivers, a JIT runtime, and an optimizing compiler stack
based on TVM. The current release includes a behavioral hardware simulator, as
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast
prototyping. By extending the TVM stack with a customizable, and open source
deep learning hardware accelerator design, we are exposing a transparent
end-to-end deep learning stack from [...]
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
<p>The VTA and TVM stack together constitute a blueprint for end-to-end,
accelerator-centric deep learning system that can:</p>
@@ -223,7 +223,7 @@ The extendability of the compiler stack, combined with the
ability to modify the
<p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning accelerator
built around a GEMM core, which performs dense matrix multiplication at a high
computational throughput.
The design is inspired by mainstream deep learning accelerators, of the likes
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide
memory access latency and maximize utilization of compute resources. To a
broader extent, VTA can serve as a template deep learning accelerator design,
exposing a clean tensor computation abstraction to the compiler stack.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
<p>The figure above presents a high-level overview of the VTA hardware
organization. VTA is composed of four modules that communicate between each
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to
allow for task-level pipeline parallelism.
The compute module performs both dense linear algebra computation with its
GEMM core, and general computation with its tensor ALU.
@@ -240,7 +240,7 @@ The first approach, which doesn’t require special hardware
is to run deep lear
This simulator back-end is readily available for developers to experiment with.
The second approach relies on an off-the-shelf and low-cost FPGA development
board – the <a href="http://www.pynq.io/">Pynq board</a>, which exposes a
reconfigurable FPGA fabric and an ARM SoC.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
<p>The VTA release offers a simple compilation and deployment flow of the VTA
hardware design and TVM workloads on the Pynq platform, with the help of an RPC
server interface.
The RPC server handles FPGA reconfiguration tasks and TVM module invocation
offloading onto the VTA runtime.
@@ -263,7 +263,7 @@ While this platform is meant for prototyping (the 2012 FPGA
cannot compete with
<p>A popular method used to assess the efficient use of hardware are roofline
diagrams: given a hardware design, how efficiently are different workloads
utilizing the hardware compute and memory resources. The roofline plot below
shows the throughput achieved on different convolution layers of the ResNet-18
inference benchmark. Each layer has a different arithmetic intensity, i.e.
compute to data movement ratio.
In the left half, convolution layers are bandwidth limited, whereas on the
right half, they are compute limited.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
<p>The goal behind designing a hardware architecture, and a compiler stack is
to bring each workload as close as possible to the roofline of the target
hardware.
The roofline plot shows the effects of having the hardware and compiler work
together to maximize utilization of the available hardware resources.
@@ -272,7 +272,7 @@ The result is an overall higher utilization of the
available compute and memory
<h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18 evaluation</h3>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
<p>A benefit of having a complete compiler stack built for VTA is the ability
to run end-to-end workloads. This is compelling in the context of hardware
acceleration because we need to understand what performance bottlenecks, and
Amdahl limitations stand in the way to obtaining faster performance.
The bar plot above shows inference performance with and without offloading the
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s
ARM Cortex A9 SoC.
diff --git a/2019/03/18/tvm-apache-announcement.html
b/2019/03/18/tvm-apache-announcement.html
index 98e350d..b154327 100644
--- a/2019/03/18/tvm-apache-announcement.html
+++ b/2019/03/18/tvm-apache-announcement.html
@@ -168,7 +168,7 @@
<p style="text-align: center"><img src="/images/main/tvm-stack.png"
alt="image" width="70%" /></p>
-<p>TVM stack began as a research project at the <a
href="https://sampl.cs.washington.edu/">SAMPL group</a> of Paul G. Allen School
of Computer Science & Engineering, University of Washington. The project
uses the loop-level IR and several optimizations from the <a
href="http://halide-lang.org/">Halide project</a>, in addition to <a
href="https://tvm.ai/about">a full deep learning compiler stack</a> to support
machine learning workloads for diverse hardware backends.</p>
+<p>TVM stack began as a research project at the <a
href="https://sampl.cs.washington.edu/">SAMPL group</a> of Paul G. Allen School
of Computer Science & Engineering, University of Washington. The project
uses the loop-level IR and several optimizations from the <a
href="http://halide-lang.org/">Halide project</a>, in addition to <a
href="https://tvm.apache.org/about">a full deep learning compiler stack</a> to
support machine learning workloads for diverse hardware backends.</p>
<p>Since its introduction, the project was driven by an open source community
involving multiple industry and academic institutions. Currently, the TVM stack
includes a high-level differentiable programming IR for high-level
optimization, a machine learning driven program optimizer and VTA – a fully
open sourced deep learning accelerator. The community brings innovations from
machine learning, compiler systems, programming languages, and computer
architecture to build a full-stack open s [...]
diff --git a/atom.xml b/atom.xml
index 4a77194..775f322 100644
--- a/atom.xml
+++ b/atom.xml
@@ -4,7 +4,7 @@
<title>TVM</title>
<link href="https://tvm.apache.org" rel="self"/>
<link href="https://tvm.apache.org"/>
- <updated>2020-03-30T11:16:12-07:00</updated>
+ <updated>2020-03-30T15:47:25-07:00</updated>
<id>https://tvm.apache.org</id>
<author>
<name></name>
@@ -269,7 +269,7 @@ We show that automatic optimization in TVM makes it easy
and flexible to support
<p style="text-align: center"><img
src="/images/main/tvm-stack.png" alt="image"
width="70%" /></p>
-<p>TVM stack began as a research project at the <a
href="https://sampl.cs.washington.edu/">SAMPL group</a> of
Paul G. Allen School of Computer Science &amp; Engineering, University of
Washington. The project uses the loop-level IR and several optimizations from
the <a href="http://halide-lang.org/">Halide project</a>,
in addition to <a href="https://tvm.ai/about">a full deep
learning compiler stack</a> to support [...]
+<p>TVM stack began as a research project at the <a
href="https://sampl.cs.washington.edu/">SAMPL group</a> of
Paul G. Allen School of Computer Science &amp; Engineering, University of
Washington. The project uses the loop-level IR and several optimizations from
the <a href="http://halide-lang.org/">Halide project</a>,
in addition to <a href="https://tvm.apache.org/about">a full
deep learning compiler stack</a> to [...]
<p>Since its introduction, the project was driven by an open source
community involving multiple industry and academic institutions. Currently, the
TVM stack includes a high-level differentiable programming IR for high-level
optimization, a machine learning driven program optimizer and VTA – a fully
open sourced deep learning accelerator. The community brings innovations from
machine learning, compiler systems, programming languages, and computer
architecture to build a full-stack [...]
@@ -1276,7 +1276,7 @@ support, and can be used to implement convenient
converters, such as
<p>VTA is more than a standalone accelerator design: it’s an end-to-end
solution that includes drivers, a JIT runtime, and an optimizing compiler stack
based on TVM. The current release includes a behavioral hardware simulator, as
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast
prototyping. By extending the TVM stack with a customizable, and open source
deep learning hardware accelerator design, we are exposing a transparent
end-to-end deep learning stac [...]
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
<p>The VTA and TVM stack together constitute a blueprint for end-to-end,
accelerator-centric deep learning system that can:</p>
@@ -1331,7 +1331,7 @@ The extendability of the compiler stack, combined with
the ability to modify the
<p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning
accelerator built around a GEMM core, which performs dense matrix
multiplication at a high computational throughput.
The design is inspired by mainstream deep learning accelerators, of the likes
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide
memory access latency and maximize utilization of compute resources. To a
broader extent, VTA can serve as a template deep learning accelerator design,
exposing a clean tensor computation abstraction to the compiler stack.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
<p>The figure above presents a high-level overview of the VTA hardware
organization. VTA is composed of four modules that communicate between each
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to
allow for task-level pipeline parallelism.
The compute module performs both dense linear algebra computation with its
GEMM core, and general computation with its tensor ALU.
@@ -1348,7 +1348,7 @@ The first approach, which doesn’t require special
hardware is to run deep lear
This simulator back-end is readily available for developers to experiment with.
The second approach relies on an off-the-shelf and low-cost FPGA development
board – the <a href="http://www.pynq.io/">Pynq board</a>,
which exposes a reconfigurable FPGA fabric and an ARM SoC.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
<p>The VTA release offers a simple compilation and deployment flow of
the VTA hardware design and TVM workloads on the Pynq platform, with the help
of an RPC server interface.
The RPC server handles FPGA reconfiguration tasks and TVM module invocation
offloading onto the VTA runtime.
@@ -1371,7 +1371,7 @@ While this platform is meant for prototyping (the 2012
FPGA cannot compete with
<p>A popular method used to assess the efficient use of hardware are
roofline diagrams: given a hardware design, how efficiently are different
workloads utilizing the hardware compute and memory resources. The roofline
plot below shows the throughput achieved on different convolution layers of the
ResNet-18 inference benchmark. Each layer has a different arithmetic intensity,
i.e. compute to data movement ratio.
In the left half, convolution layers are bandwidth limited, whereas on the
right half, they are compute limited.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
<p>The goal behind designing a hardware architecture, and a compiler
stack is to bring each workload as close as possible to the roofline of the
target hardware.
The roofline plot shows the effects of having the hardware and compiler work
together to maximize utilization of the available hardware resources.
@@ -1380,7 +1380,7 @@ The result is an overall higher utilization of the
available compute and memory
<h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18
evaluation</h3>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
<p>A benefit of having a complete compiler stack built for VTA is the
ability to run end-to-end workloads. This is compelling in the context of
hardware acceleration because we need to understand what performance
bottlenecks, and Amdahl limitations stand in the way to obtaining faster
performance.
The bar plot above shows inference performance with and without offloading the
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s
ARM Cortex A9 SoC.
diff --git a/rss.xml b/rss.xml
index 967dd59..dc52cef 100644
--- a/rss.xml
+++ b/rss.xml
@@ -5,8 +5,8 @@
<description>TVM - </description>
<link>https://tvm.apache.org</link>
<atom:link href="https://tvm.apache.org" rel="self"
type="application/rss+xml" />
- <lastBuildDate>Mon, 30 Mar 2020 11:16:12 -0700</lastBuildDate>
- <pubDate>Mon, 30 Mar 2020 11:16:12 -0700</pubDate>
+ <lastBuildDate>Mon, 30 Mar 2020 15:47:25 -0700</lastBuildDate>
+ <pubDate>Mon, 30 Mar 2020 15:47:25 -0700</pubDate>
<ttl>60</ttl>
@@ -264,7 +264,7 @@ We show that automatic optimization in TVM makes it easy
and flexible to support
<p style="text-align: center"><img
src="/images/main/tvm-stack.png" alt="image"
width="70%" /></p>
-<p>TVM stack began as a research project at the <a
href="https://sampl.cs.washington.edu/">SAMPL group</a> of
Paul G. Allen School of Computer Science &amp; Engineering, University of
Washington. The project uses the loop-level IR and several optimizations from
the <a href="http://halide-lang.org/">Halide project</a>,
in addition to <a href="https://tvm.ai/about">a full deep
learning compiler stack</a> to support [...]
+<p>TVM stack began as a research project at the <a
href="https://sampl.cs.washington.edu/">SAMPL group</a> of
Paul G. Allen School of Computer Science &amp; Engineering, University of
Washington. The project uses the loop-level IR and several optimizations from
the <a href="http://halide-lang.org/">Halide project</a>,
in addition to <a href="https://tvm.apache.org/about">a full
deep learning compiler stack</a> to [...]
<p>Since its introduction, the project was driven by an open source
community involving multiple industry and academic institutions. Currently, the
TVM stack includes a high-level differentiable programming IR for high-level
optimization, a machine learning driven program optimizer and VTA – a fully
open sourced deep learning accelerator. The community brings innovations from
machine learning, compiler systems, programming languages, and computer
architecture to build a full-stack [...]
@@ -1271,7 +1271,7 @@ support, and can be used to implement convenient
converters, such as
<p>VTA is more than a standalone accelerator design: it’s an end-to-end
solution that includes drivers, a JIT runtime, and an optimizing compiler stack
based on TVM. The current release includes a behavioral hardware simulator, as
well as the infrastructure to deploy VTA on low-cost FPGA hardware for fast
prototyping. By extending the TVM stack with a customizable, and open source
deep learning hardware accelerator design, we are exposing a transparent
end-to-end deep learning stac [...]
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
<p>The VTA and TVM stack together constitute a blueprint for end-to-end,
accelerator-centric deep learning system that can:</p>
@@ -1326,7 +1326,7 @@ The extendability of the compiler stack, combined with
the ability to modify the
<p>The Vanilla Tensor Accelerator (VTA) is a generic deep learning
accelerator built around a GEMM core, which performs dense matrix
multiplication at a high computational throughput.
The design is inspired by mainstream deep learning accelerators, of the likes
of Google’s TPU accelerator. The design adopts decoupled access-execute to hide
memory access latency and maximize utilization of compute resources. To a
broader extent, VTA can serve as a template deep learning accelerator design,
exposing a clean tensor computation abstraction to the compiler stack.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png"
alt="image" width="60%" /></p>
<p>The figure above presents a high-level overview of the VTA hardware
organization. VTA is composed of four modules that communicate between each
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to
allow for task-level pipeline parallelism.
The compute module performs both dense linear algebra computation with its
GEMM core, and general computation with its tensor ALU.
@@ -1343,7 +1343,7 @@ The first approach, which doesn’t require special
hardware is to run deep lear
This simulator back-end is readily available for developers to experiment with.
The second approach relies on an off-the-shelf and low-cost FPGA development
board – the <a href="http://www.pynq.io/">Pynq board</a>,
which exposes a reconfigurable FPGA fabric and an ARM SoC.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png"
alt="image" width="70%" /></p>
<p>The VTA release offers a simple compilation and deployment flow of
the VTA hardware design and TVM workloads on the Pynq platform, with the help
of an RPC server interface.
The RPC server handles FPGA reconfiguration tasks and TVM module invocation
offloading onto the VTA runtime.
@@ -1366,7 +1366,7 @@ While this platform is meant for prototyping (the 2012
FPGA cannot compete with
<p>A popular method used to assess the efficient use of hardware are
roofline diagrams: given a hardware design, how efficiently are different
workloads utilizing the hardware compute and memory resources. The roofline
plot below shows the throughput achieved on different convolution layers of the
ResNet-18 inference benchmark. Each layer has a different arithmetic intensity,
i.e. compute to data movement ratio.
In the left half, convolution layers are bandwidth limited, whereas on the
right half, they are compute limited.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png"
alt="image" width="60%" /></p>
<p>The goal behind designing a hardware architecture, and a compiler
stack is to bring each workload as close as possible to the roofline of the
target hardware.
The roofline plot shows the effects of having the hardware and compiler work
together to maximize utilization of the available hardware resources.
@@ -1375,7 +1375,7 @@ The result is an overall higher utilization of the
available compute and memory
<h3 id="end-to-end-resnet-18-evaluation">End to end ResNet-18
evaluation</h3>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png"
alt="image" width="60%" /></p>
<p>A benefit of having a complete compiler stack built for VTA is the
ability to run end-to-end workloads. This is compelling in the context of
hardware acceleration because we need to understand what performance
bottlenecks, and Amdahl limitations stand in the way to obtaining faster
performance.
The bar plot above shows inference performance with and without offloading the
ResNet convolutional layers to the FPGA-based VTA design, on the Pynq board’s
ARM Cortex A9 SoC.
diff --git a/vta.html b/vta.html
index e7ad980..ce54668 100644
--- a/vta.html
+++ b/vta.html
@@ -159,7 +159,7 @@ The current release includes a behavioral hardware
simulator, as well as the inf
By extending the TVM stack with a customizable, and open source deep learning
hardware accelerator design, we are exposing a transparent end-to-end deep
learning stack from the high-level deep learning framework, down to the actual
hardware design and implementation.
This forms a truly end-to-end, from software-to-hardware open source stack for
deep learning systems.</p>
-<p style="text-align: center"><img
src="http://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
+<p style="text-align: center"><img
src="https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png"
alt="image" width="50%" /></p>
<p>The VTA and TVM stack together constitute a blueprint for end-to-end,
accelerator-centric deep learning system that can:</p>
@@ -174,7 +174,7 @@ TVM is now an effort undergoing incubation at The Apache
Software Foundation (AS
driven by an open source community involving multiple industry and academic
institutions
under the Apache way.</p>
-<p>Read more about VTA in the <a
href="https://tvm.ai/2018/07/12/vta-release-announcement.html">TVM blog
post</a>, or in the <a href="https://arxiv.org/abs/1807.04188">VTA
techreport</a>.</p>
+<p>Read more about VTA in the <a
href="https://tvm.apache.org/2018/07/12/vta-release-announcement.html">TVM blog
post</a>, or in the <a href="https://arxiv.org/abs/1807.04188">VTA
techreport</a>.</p>
</div>
</div>