[incubator-tvm-site] branch master updated: WebGPU blog (#8)

tqchen Thu, 14 May 2020 11:00:26 -0700

This is an automated email from the ASF dual-hosted git repository.

tqchen pushed a commit to branch master
in repository https://gitbox.apache.org/repos/asf/incubator-tvm-site.git



The following commit(s) were added to refs/heads/master by this push:
     new d60217b  WebGPU blog (#8)
d60217b is described below

commit d60217bfad6507f3997718253c74cb5e4143b236
Author: Tianqi Chen <tqc...@users.noreply.github.com>
AuthorDate: Thu May 14 10:59:24 2020 -0700

    WebGPU blog (#8)
---
 ...g-machine-learning-to-webassembly-and-webgpu.md |  88 +++++++++++++++++++++
 images/webgpu/ml-compiler-flow.png                 | Bin 0 -> 197380 bytes
 images/webgpu/tvm-wasm-stack.png                   | Bin 0 -> 412428 bytes
 images/webgpu/webgpu-mobilenet-perf.png            | Bin 0 -> 90966 bytes
 4 files changed, 88 insertions(+)

diff --git 
a/_posts/2020-05-14-compiling-machine-learning-to-webassembly-and-webgpu.md 
b/_posts/2020-05-14-compiling-machine-learning-to-webassembly-and-webgpu.md
new file mode 100644
index 0000000..a24fae6
--- /dev/null
+++ b/_posts/2020-05-14-compiling-machine-learning-to-webassembly-and-webgpu.md
@@ -0,0 +1,88 @@
+---
+layout: post
+title: 'Compiling Machine Learning to WASM and WebGPU with Apache TVM'
+author: Tianqi Chen and Jared Roesch, OctoML
+date: 2020-05-14
+---
+
+**TLDR**
+
+We introduced support for WASM and WebGPU to the Apache TVM deep learning 
compiler. Our experiments shows that  TVM's WebGPU backend can get **close to 
native** **GPU performance** when deploying models to the web.
+
+{:center: style="text-align: center"}
+![image](/images/webgpu/webgpu-mobilenet-perf.png){: width="55%"}<br />
+{:center}
+
+## Introduction
+
+Computing is one of the pillars of modern machine learning applications. The 
introduction of the GPU to accelerate deep learning workloads has increased the 
rate of progress dramatically. Given the growing requirement to deploy machine 
learning everywhere, the browser becomes a natural place to deploy intelligent 
applications.
+
+While TensorFlow.js and ONNX.js are existing efforts to bring machine learning 
to the browser, there still exist non-trivial gaps in performance between the 
web versions and native ones. One of the many reasons is the lack of standard 
and performant access to the GPU on the web. WebGL lacks important features 
such as compute shaders and generic storage buffers that are necessary for high 
performance deep learning.
+
+WebGPU is the upcoming standard for next generation web graphics which has the 
possibility to dramatically change this situation. Like the latest generation 
graphics APIs such as Vulkan and Metal, WebGPU offers first-class compute 
shader support.
+
+To explore the potential of using WebGPU for machine learning deployment in 
the browser, we enhanced the deep learning compiler Apache(incubating) TVM to 
target WASM (for host code that computes the launching parameters and calls 
into the device launch) and WebGPU (for device execution). Our preliminary 
results are quite positive — for the first time, we can deploy machine learning 
applications on the web while still getting near native performance on the GPU.
+
+## Machine Learning Compiler
+
+{:center: style="text-align: center"}
+![image](/images/webgpu/ml-compiler-flow.png){: width="65%"}<br />
+{:center}
+
+One natural reaction when trying out WebGPU is to write shaders for primitive 
operators in deep neural networks (matrix multiplication and convolution) and 
then directly optimize their performance. This is the traditional workflow used 
 by existing frameworks such as TensorFlow.js.
+
+Instead, we apply a compilation based approach. TVM automatically ingests 
models from high-level frameworks such as TensorFlow, Keras, PyTorch, MXNet and 
ONNX and uses a machine learning driven approach to automatically generate low 
level code, in this case compute shaders in SPIR-V format. The generated code 
can then be packaged as a deployable module.
+
+One important advantage of the compilation based approach is the reuse of 
infrastructure. We are able to effortlessly (relative to [other 
approaches](https://arxiv.org/abs/1901.05350)) target the web by reusing the 
infrastructure for optimizing GPU kernels for native platforms such as CUDA, 
Metal and OpenCL. If the mapping of the WebGPU API to native APIs is efficient 
we can expect similar performance with very little work. More importantly, the 
[AutoTVM](https://tvm.apache.org/2018/10/0 [...]
+
+## Building a WASM and WebGPU Compiler
+
+In order to build a compiler that can target WASM and WebGPU, we need the 
following elements:
+
+- A SPIR-V generator for compute shaders.
+- A WASM generator for the host program.
+- A runtime to load and execute the generated program.
+
+Luckily, TVM already has a SPIR-V target for Vulkan, and uses LLVM for host 
code generation. So we can just repurpose the two to generate the device and 
host programs.
+
+The main challenge is the runtime. We need a runtime to load the shader code, 
and to enable  the host code talk to communicate with the shader correctly. TVM 
has a minimum C++ based runtime. We build a minimum web runtime library and 
link it with the generated shader and host driving code, producing a single 
WASM file. However, this WASM module still contains two unknown dependencies:
+
+- The runtime needs to call into system library calls (malloc, stderr).
+- The wasm runtime needs to interact with the WebGPU driver (in javascript 
where the WebGPU API is the first-class citizen).
+
+WASI is a standard solution to solve the first problem. While there is not yet 
a mature WASI on the web, we can use emscripten to generate a WASI-like library 
(see discussion 
[here](https://github.com/emscripten-core/emscripten/issues/11075)) to provide 
these system libraries.
+
+We solve the second problem by building a WebGPU runtime inside TVM's JS 
runtime, and calling back to these functions from the WASM module when invoking 
GPU code. Using the 
[PackedFunc](https://tvm.apache.org/docs/dev/runtime.html#packedfunc) mechanism 
in TVM's runtime system, we can directly expose high-level runtime primitives 
by passing JavaScript closures to the WASM interface. This approach keeps most 
of the runtime code in JavaScript, we could bring more JS code into the WASM 
runti [...]
+
+{:center: style="text-align: center"}
+![image](/images/webgpu/tvm-wasm-stack.png){: width="65%"}
+{:center}
+
+## Performance
+
+{:center: style="text-align: center"}
+![image](/images/webgpu/webgpu-mobilenet-perf.png){: width="65%"}
+{:center}
+
+We ran a quick experiment comparing the execution of a full computational 
graph via TVM's WebGPU backend and native targets that use native GPU runtimes 
(Metal and OpenCL). On the MobileNet model, we can find that the WebGPU can get 
close to matching the performance of Metal. Assuming Chrome WebGPU's runtime 
targets Metal instead of OpenCL on the MacOS, we can safely assume there is 
little to no performance loss when targeting the GPU.
+
+This benchmark excludes the CPU to GPU data copy cost and only benchmarks the 
GPU execution. Currently the data copy from CPU to GPU can still take 25% of 
the execution time; however, these costs can further be amortized via 
approaches like double buffering in a continuous execution setting.
+
+Our reported end-to-end running time of mobilenet is by no means optimal, 
since we simply reused a tuned programs from GTX 1080 Ti, which is very 
different from the Intel graphics GPU. We expect further performance boost by 
using [AutoTVM](https://tvm.apache.org/2018/10/03/auto-opt-all) on the target 
platform of interest.
+
+## Looking to the Future
+
+Our results suggest many interesting opportunities for machine learning on the 
web. Notably, WebGPU is an API that is still evolving and its implications 
could go beyond web applications. For example one could target native APIs of 
WebGPU as it matures and becomes standardized through WASI, enabling standalone 
WASM applications that make use of WebGPU.
+
+The TVM community is also actively working on a [Rust based 
runtime](https://github.com/apache/incubator-tvm/tree/master/rust) that would 
enable much more robust WASM support and enable easier interaction with 
projects like [wgpu](https://github.com/gfx-rs/wgpu-rs), and the [Rust 
WASM](https://rustwasm.github.io/docs/book/) ecosystem. As an open source 
project, we are looking for contributors who can bring in new ideas and help 
push the project in these exciting directions.
+
+The proposed approach provides effective machine learning support for most 
WASM's application scenarios. The close to native performance could unlock 
better [federated learning](https://en.wikipedia.org/wiki/Federated_learning) 
capabilities on the browser. The same compiled package should also be able to 
run on native WASM executors to provide sandbox for the applications.
+
+## Show me the Code
+
+- [Example project for image 
classification](https://github.com/tqchen/tvm-webgpu-example)
+- [Apache TVM on 
github](https://github.com/apache/incubator-tvm/tree/master/web)
+
+## Acknowledgement
+
+We would like to thank the emscripten project for providing the WASM 
compilation infrastructures as well as the JS library support on the web. We 
would also like to thank the WebGPU community for various helpful discussions. 
Thanks to Fletcher Haynes for valuable feedbacks to the post.
diff --git a/images/webgpu/ml-compiler-flow.png 
b/images/webgpu/ml-compiler-flow.png
new file mode 100644
index 0000000..93ee58f
Binary files /dev/null and b/images/webgpu/ml-compiler-flow.png differ
diff --git a/images/webgpu/tvm-wasm-stack.png b/images/webgpu/tvm-wasm-stack.png
new file mode 100644
index 0000000..a6033ec
Binary files /dev/null and b/images/webgpu/tvm-wasm-stack.png differ
diff --git a/images/webgpu/webgpu-mobilenet-perf.png 
b/images/webgpu/webgpu-mobilenet-perf.png
new file mode 100644
index 0000000..f402d09
Binary files /dev/null and b/images/webgpu/webgpu-mobilenet-perf.png differ

[incubator-tvm-site] branch master updated: WebGPU blog (#8)

Reply via email to