[incubator-tvm-site] branch main updated: Update stale link

tqchen Tue, 03 Nov 2020 06:05:39 -0800

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-site.git



The following commit(s) were added to refs/heads/main by this push:
     new ce2f2e9  Update stale link
ce2f2e9 is described below

commit ce2f2e98787d1c332a4c1e895dbd2150b3e2afcb
Author: tqchen <tianqi.tc...@gmail.com>
AuthorDate: Tue Nov 3 09:01:49 2020 -0500

    Update stale link
---
 ...g-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.md |  6 +++---
 ...ringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.md |  2 +-
 _posts/2018-07-12-vta-release-announcement.markdown            | 10 +++++-----
 _posts/2018-08-10-DLPack-Bridge.md                             |  2 +-
 _posts/2018-12-18-lowprecision-conv.md                         |  4 ++--
 _posts/2019-01-19-Golang.md                                    |  6 +++---
 _posts/2019-04-30-opt-cuda-quantized.md                        | 10 +++++-----
 7 files changed, 20 insertions(+), 20 deletions(-)

diff --git 
a/_posts/2017-08-22-Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.md
 
b/_posts/2017-08-22-Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.md
index 494d531..388b4bb 100644
--- 
a/_posts/2017-08-22-Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.md
+++ 
b/_posts/2017-08-22-Optimize-Deep-Learning-GPU-Operators-with-TVM-A-Depthwise-Convolution-Example.md
@@ -409,9 +409,9 @@ The advantage of operator fusion is obvious.
 This is not the end, TVM can do operator fusion in a smarter way. You may 
refer to [this](https://github.com/dmlc/tvm/issues/215) and read the source 
code provided below.
 
 ## Show me the code
-- Declare: 
[https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py](https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/depthwise_conv2d.py)
-- Schedule: 
[https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py](https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/depthwise_conv2d.py)
-- Test: 
[https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py](https://github.com/dmlc/tvm/blob/master/topi/recipe/conv/depthwise_conv2d_test.py)
+- Declare: 
[https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/depthwise_conv2d.py](https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/depthwise_conv2d.py)
+- Schedule: 
[https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/depthwise_conv2d.py](https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/depthwise_conv2d.py)
+- Test: 
[https://github.com/apache/incubator-tvm/blob/main/topi/recipe/conv/depthwise_conv2d_test.py](https://github.com/apache/incubator-tvm/blob/main/topi/recipe/conv/depthwise_conv2d_test.py)
 
 ## Acknowledgements
 The author has many thanks to Tianqi Chen for his helpful advice and inspiring 
discussion.
diff --git 
a/_posts/2017-10-30-Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.md
 
b/_posts/2017-10-30-Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.md
index 3073e40..5500e39 100644
--- 
a/_posts/2017-10-30-Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.md
+++ 
b/_posts/2017-10-30-Bringing-AMDGPUs-to-TVM-Stack-and-NNVM-Compiler-with-ROCm.md
@@ -82,7 +82,7 @@ The input images are taken from the original paper, and they 
are available [here
 ## A Note on performance
 
 
-The current support on ROCm focuses on the functionality coverage. We have 
already seen promising performance results by simply adopting existing TVM 
schedules for CUDA backend. For example, you can try running [the gemm test 
script](https://github.com/dmlc/tvm/blob/master/topi/recipe/gemm/cuda_gemm_square.py)
 in the TVM repository and see the result. For two types of cards we tested, 
the current gemm recipe for square matrix multiplication (not yet specifically 
optimized for AMD GPUs) a [...]
+The current support on ROCm focuses on the functionality coverage. We have 
already seen promising performance results by simply adopting existing TVM 
schedules for CUDA backend. For example, you can try running [the gemm test 
script](https://github.com/apache/incubator-tvm/blob/main/topi/recipe/gemm/cuda_gemm_square.py)
 in the TVM repository and see the result. For two types of cards we tested, 
the current gemm recipe for square matrix multiplication (not yet specifically 
optimized for A [...]
 This is already a promising start, as it is very hard to optimize performance 
to get to peak and we
 did not yet apply AMD GPU specific optimizations.
 We are starting to look at performance optimization and we expect more 
improvement to come.
diff --git a/_posts/2018-07-12-vta-release-announcement.markdown 
b/_posts/2018-07-12-vta-release-announcement.markdown
index 440d484..eb4e929 100644
--- a/_posts/2018-07-12-vta-release-announcement.markdown
+++ b/_posts/2018-07-12-vta-release-announcement.markdown
@@ -21,7 +21,7 @@ We are excited to announce the launch of the Versatile Tensor 
Accelerator (VTA,
 VTA is more than a standalone accelerator design: it’s an end-to-end solution 
that includes drivers, a JIT runtime, and an optimizing compiler stack based on 
TVM. The current release includes a behavioral hardware simulator, as well as 
the infrastructure to deploy VTA on low-cost FPGA hardware for fast 
prototyping. By extending the TVM stack with a customizable, and open source 
deep learning hardware accelerator design, we are exposing a transparent 
end-to-end deep learning stack from th [...]
 
 {:center}
-![image](https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_stack.png){:
 width="50%"}
+![image](https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_stack.png){:
 width="50%"}
 {:center}
 
 The VTA and TVM stack together constitute a blueprint for end-to-end, 
accelerator-centric deep learning system that can:
@@ -76,7 +76,7 @@ The Vanilla Tensor Accelerator (VTA) is a generic deep 
learning accelerator buil
 The design is inspired by mainstream deep learning accelerators, of the likes 
of Google's TPU accelerator. The design adopts decoupled access-execute to hide 
memory access latency and maximize utilization of compute resources. To a 
broader extent, VTA can serve as a template deep learning accelerator design, 
exposing a clean tensor computation abstraction to the compiler stack.
 
 {:center}
-![image](https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_overview.png){:
 width="60%"}
+![image](https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_overview.png){:
 width="60%"}
 {:center}
 
 The figure above presents a high-level overview of the VTA hardware 
organization. VTA is composed of four modules that communicate between each 
other via FIFO queues and single-writer/single-reader SRAM memory blocks, to 
allow for task-level pipeline parallelism.
@@ -95,7 +95,7 @@ This simulator back-end is readily available for developers 
to experiment with.
 The second approach relies on an off-the-shelf and low-cost FPGA development 
board -- the [Pynq board](http://www.pynq.io/), which exposes a reconfigurable 
FPGA fabric and an ARM SoC.
 
 {:center}
-![image](https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_system.png){:
 width="70%"}
+![image](https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_system.png){:
 width="70%"}
 {:center}
 
 The VTA release offers a simple compilation and deployment flow of the VTA 
hardware design and TVM workloads on the Pynq platform, with the help of an RPC 
server interface.
@@ -120,7 +120,7 @@ A popular method used to assess the efficient use of 
hardware are roofline diagr
 In the left half, convolution layers are bandwidth limited, whereas on the 
right half, they are compute limited.
 
 {:center}
-![image](https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_roofline.png){:
 width="60%"}
+![image](https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_roofline.png){:
 width="60%"}
 {:center}
 
 The goal behind designing a hardware architecture, and a compiler stack is to 
bring each workload as close as possible to the roofline of the target hardware.
@@ -131,7 +131,7 @@ The result is an overall higher utilization of the 
available compute and memory
 ### End to end ResNet-18 evaluation
 
 {:center}
-![image](https://raw.githubusercontent.com/uwsaml/web-data/master/vta/blogpost/vta_e2e.png){:
 width="60%"}
+![image](https://raw.githubusercontent.com/uwsampl/web-data/master/vta/blogpost/vta_e2e.png){:
 width="60%"}
 {:center}
 
 A benefit of having a complete compiler stack built for VTA is the ability to 
run end-to-end workloads. This is compelling in the context of hardware 
acceleration because we need to understand what performance bottlenecks, and 
Amdahl limitations stand in the way to obtaining faster performance.
diff --git a/_posts/2018-08-10-DLPack-Bridge.md 
b/_posts/2018-08-10-DLPack-Bridge.md
index fb4b2e2..8b9d875 100644
--- a/_posts/2018-08-10-DLPack-Bridge.md
+++ b/_posts/2018-08-10-DLPack-Bridge.md
@@ -126,7 +126,7 @@ We can repeat the same example, but using MxNet instead:
 
 Under the hood of the PyTorch Example
 -------------------------------------
-As TVM provides 
[functions](https://github.com/dmlc/tvm/blob/master/include/tvm/runtime/c_runtime_api.h#L455)
 to convert dlpack tensors to tvm `NDArray`s and
+As TVM provides 
[functions](https://github.com/apache/incubator-tvm/blob/main/include/tvm/runtime/c_runtime_api.h#L455)
 to convert dlpack tensors to tvm `NDArray`s and
 vice-versa, so all that is needed is some syntactic sugar by wrapping 
functions.
 `convert_func` is a generic converter for frameworks using tensors with dlpack
 support, and can be used to implement convenient converters, such as
diff --git a/_posts/2018-12-18-lowprecision-conv.md 
b/_posts/2018-12-18-lowprecision-conv.md
index f8d2c62..58a4c1c 100644
--- a/_posts/2018-12-18-lowprecision-conv.md
+++ b/_posts/2018-12-18-lowprecision-conv.md
@@ -155,8 +155,8 @@ Note: x86 doesn’t support a vectorized popcount for this 
microarchitecture, so
 
 ## Show me the code
 
-- [TOPI bitserial 
convolution](https://github.com/dmlc/tvm/blob/master/topi/python/topi/nn/bitserial_conv2d.py)
-- [TOPI ARM cpu bitserial 
convolution](https://github.com/dmlc/tvm/blob/master/topi/python/topi/arm_cpu/bitserial_conv2d.py)
+- [TOPI bitserial 
convolution](https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/nn/bitserial_conv2d.py)
+- [TOPI ARM cpu bitserial 
convolution](https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/arm_cpu/bitserial_conv2d.py)
 
 
 ## References
diff --git a/_posts/2019-01-19-Golang.md b/_posts/2019-01-19-Golang.md
index 7825345..d559b14 100644
--- a/_posts/2019-01-19-Golang.md
+++ b/_posts/2019-01-19-Golang.md
@@ -147,13 +147,13 @@ func main() {
 ```
 
 ```gotvm``` extends the TVM packed function system to support golang function 
closures as packed functions.
-[Examples](https://github.com/dmlc/tvm/blob/master/golang/sample) available to 
register golang
+[Examples](https://github.com/apache/incubator-tvm/blob/main/golang/sample) 
available to register golang
 closure as TVM packed function and invoke the same across programming language 
barriers.
 
 ## Show me the code
 
-- [Package Source](https://github.com/dmlc/tvm/blob/master/golang/src)
-- [Examples](https://github.com/dmlc/tvm/blob/master/golang/sample)
+- [Package 
Source](https://github.com/apache/incubator-tvm/blob/main/golang/src)
+- [Examples](https://github.com/apache/incubator-tvm/blob/main/golang/sample)
 
 ## References
 
diff --git a/_posts/2019-04-30-opt-cuda-quantized.md 
b/_posts/2019-04-30-opt-cuda-quantized.md
index ecacd6e..777d946 100644
--- a/_posts/2019-04-30-opt-cuda-quantized.md
+++ b/_posts/2019-04-30-opt-cuda-quantized.md
@@ -73,7 +73,7 @@ Figure 2. 2D convolution with data layout in NCHW4c and 
weight layout in OIHW4o4
 <b>Right</b>: The output in NCHW4c layout. Inside the one element depicted, 
there are four packed elements in channel sub-dimension.
 </div><p></p>
 
-After we have specified the layout of convolution layers, other operators such 
as `add` and activations can automatically adapt to the chosen layout during 
the 
[AlterOpLayout](https://github.com/dmlc/tvm/blob/master/src/relay/pass/alter_op_layout.cc)
 pass in Relay.
+After we have specified the layout of convolution layers, other operators such 
as `add` and activations can automatically adapt to the chosen layout during 
the 
[AlterOpLayout](https://github.com/apache/incubator-tvm/blob/main/src/relay/pass/alter_op_layout.cc)
 pass in Relay.
 The layout transformation of the weight can be precomputed offline. Therefore, 
we can run the whole model in the same layout without extra overhead.
 
 ## Designing Search Space for Automatic Optimization
@@ -138,10 +138,10 @@ We show that automatic optimization in TVM makes it easy 
and flexible to support
 
 # Show Me the Code
 * [Benchmark](https://github.com/vinx13/tvm-cuda-int8-benchmark)
-* [CUDA int8 
conv2d](https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/conv2d_int8.py)
-* [CUDA int8 group 
conv2d](https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/group_conv2d_nchw.py)
-* [CUDA int8 
dense](https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/dense.py)
-* [Tensor intrinsics 
declaration](https://github.com/dmlc/tvm/blob/master/topi/python/topi/cuda/tensor_intrin.py)
 
+* [CUDA int8 
conv2d](https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/conv2d_int8.py)
+* [CUDA int8 group 
conv2d](https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/group_conv2d_nchw.py)
+* [CUDA int8 
dense](https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/dense.py)
+* [Tensor intrinsics 
declaration](https://github.com/apache/incubator-tvm/blob/main/topi/python/topi/cuda/tensor_intrin.py)
 
 
 # Bio & Acknowledgement
 [Wuwei Lin](https://wuwei.io/) is an undergraduate student at SJTU. He is 
currently an intern at TuSimple. The author has many thanks to [Tianqi 
Chen](https://homes.cs.washington.edu/~tqchen/) and [Eddie 
Yan](https://homes.cs.washington.edu/~eqy/) for their reviews.

[incubator-tvm-site] branch main updated: Update stale link

Reply via email to