masahi commented on a change in pull request #4258: [WIP][TVM] Bring Your Own Codegen to TVM URL: https://github.com/apache/incubator-tvm/pull/4258#discussion_r352310360
########## File path: tutorials/dev/custom_relay_backend.py ########## @@ -0,0 +1,291 @@ +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. +""" + +.. _tutorial-custom-relay-backend: + +Bring Your Own Codegen To TVM +============================= +**Author**: `Zhi Chen <https://github.com/zhiics>`_, `Cody Hao Yu <https:://github.com/comaniac>`_ + +As the hardware devices targeted by deep learning workloads keep increasing, the required knowledge +for users to achieve high performance on various devices keeps increasing as well. To free data +scientists from worrying about the performance when developing a new model, hardware vendors either +provide libraries such as MKLDNN or cuDNN with many commonly used deep learning operators, +or provide frameworks such as TensorRT to let users describe their models in a certain way to +achieve high performance. However, users have to learn a new programming interface when they +attempt to work on a new library or device. As a result, the demand of a unified programming +interface becomes more and more important to 1) let all users and hardware vendors stand on the +same page, and 2) provide a feasible solution to allow a specialized hardware or library to only +support widely used operators with extremely high performance, but fallback unsupported operators +to general devices like CPU/GPU. + +In this tutorial, we demonstrate how a hardware vendor can easily implement +a Relay backend to support a specialized hardware device/library. It mainly +takes three steps: 1) define whether an operator is supported under a given +template, 2) specify how to compile and serialize the supported operators so +that it can ingest TVM specific data format, e.g. NDArray, and 3) specify how +to execute the compiled operators on a certain device. We will demonstrate how +to add a new backend that uses open source compilers (e.g. GCC, LLVM, etc) or any +proprietary compilers to execute a subgraph of a model without the exposure of +the IP of customer's codegen tool chain. Note that you will need to add the +specialized Relay backend to the TVM codebase and rebuild TVM for enabling. + +""" + +###################################################################### +# Define The Supported Operators +# ------------------------------ +# The first step is to define which operators are supported by your backend. +# A template is provided to ease vendor's effort to add the supported +# operators. +# +# For example, We create a new Python file at python/relay/backend/op/contrib/gcc/extern_op.py, +# and implement a set of boolean functions with corresponding operator names. A boolean +# function should return `True` if we allow it to be executed by the given backend; `False` +# otherwise. + +from __future__ import absolute_import + +def conv2d(attrs, args): + """Check if the external codegen should be used. + """ + return False + +def subtract(attrs, args): + """Check if the external codegen should be used. + """ + return True + +def add(attrs, args): + """Check if the external codegen should be used. + """ + return True + +def multiply(attrs, args): + """Check if the external codegen should be used. + """ + return True + +###################################################################### +# Note that since we include `attrs` and `args` into the function signature, we +# can define more complicated rules. For example, we can only support conv2d +# with float32 data type or with kernel size 1x1. In addition, the vendors can +# also check the attributes associated with a given operator to decide if it is +# supported by checking the fields in `attrs`. In an even more complicated but +# interesting scenario, we also allow developers to check the sequence of +# operators through iterating on the `agrs`. However, this is only +# unidirectional as only the inputs are visible. +# +# After annotating whether an operator can be executed on the given backend. +# Users can directly invoke the partitioning pass to separate the graph into +# multiple segments. The C++ backend implements a partitioning pass to fulfill +# the task and creates subgraphs/sub-functions with *External* attribute, +# indicating that this function will be handled by external codegen tool. +# Therefore, Relay passes should skip optimizations on them. + +###################################################################### +# Customize Subgraph Annotations +# ------------------------------ +# In addition to specifying a set of rules for supported operators, we can also implement +# a Relay IR mutator to find the supported subgraphs, which may include multiple operators, +# for the target backend. Here we implement an annotator that includes an entire Relay graph +# to be offloaded. Specifically, we are going to do two tasks: +# - insert `subgraph_begin` after all input variables +# - insert `subgraph_end` before the primary output. For example, given a Relay graph as follows: +# input_a +# | +# add --- input_b +# | +# subtract --- input_c +# | +# multiply --- input_d +# | +# out +# +# Our goal is to mutate the graph to the following: +# +# input_a +# | +# subgraph_begin +# | +# add --- subgraph_begin --- input_b +# | +# subtract --- subgraph_begin --- input_c +# | +# multiply --- subgraph_begin --- input_d +# | +# subgraph_end +# | +# out +# +# The implementation is shown as follows. As can be seen, the annotator is derived from +# `ExprMutator` that traverses a Relay graph and allows we to mutate it. We know that all ops Review comment: allow us ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
