[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #71: Add Target Pre-processing RFC

GitBox Fri, 20 May 2022 00:36:01 -0700


junrushao1994 commented on PR #71:
URL: https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132579480

Thanks @Mousius for drafing this RFC!

First of all, I completely agree on the importance to handle `arch`-specific
checks. Use our experience as an example, on CUDA we might want to check if the
PTX intrinsic `cp.async.commit_group` is available on certain architecture
before tensorizing using that PTX intrinsic, and the existing approach is to
add extra flags in `TargetKind`.

To motivate discussion, I would love to provide more context on the system
design, and then talk about some specific points I noticed in the current RFC,
and finally propose some ideas inspired by this RFC.

## Background

**Design principles.** Just wanted to share some of my design principles
when developing TVM, the compiler I believe that aims to work across hardware
models:
- A1. Generic. By design, TVM serves the common interest of all vendors.
Therefore, it would be great if we could discuss broadly how a design choice
would benefit all hardware platforms, including ARM, NV, Intel, Qual, etc.
- A2. Minimal. Ideally, practitioners are supposed to learn only bare
minimum to participate in TVM development.
- Take TIR as an example, for a developer who knows the IR and passes to
use and develop TIR, the only extra concept is `Target` .
- Indeed Relay is in a less ideal state, but Relax is designed to address
this issue so I wouldn't worry too much
- A3. Customizable. We want to develop the infrastructure to provide
customization opportunities for all vendors to independently work on their
extension in a collaborative way, making TVM a platform for everyone. A few
examples:
- TIR scheduling is heading towards this direction that all the schedule
primitives are decoupled with each other, so that vendors could develop their
own primitives without affecting each other.
- TVMScript is going to be re-designed this way treating TIR and Relax as
independent dialects while the core infra itself supports any 3rdparty IR.
- Admittedly Relay is much limited with the assumption of Relay => TE =>
TIR lowering path, with some hooks to hack in other compilation paths, but
Relax is going to change this so again I wouldn't worry...

**Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific
helper functions, mentioned in the "Motivation" section, are developed by
concurrent efforts from multiple parties, and therefore I would say
fragmentation is almost a certain thing to happen. On the other hand,
fragmentation of those helper functions, which are indeed ad-hoc, is currently
confined locally as independent checks without yet polluting the design of
global infrastructure.

**Special target attributes.** Below are a few existing target attributes
that serve special semantics, the design of which I don't fully agree with, but
now are preserved as legacy:
- **keys.** This is used to guide TOPI dispatch. For example, "llvm
--keys=arm_cpu,cpu" first finds if there is any `GenericFunc` registered with
`arm_cpu`, and if not, it falls back to `cpu`.
- **libs.** This is used in TOPI to dispatch to vendor library. For example,
"cuda -libs=cudnn" prefers dispatching to cuDNN.
- **device.** and **model.** These two sometimes control the behavior of
auto-tuning.

**Existing `arch`-like attributes.** Note that the design of `Target`
attributes in `TargetKind` is intended to describe the capability of hardware,
according to the [Target
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844):
> Pass in additional target-specific attributes like ISA/library extension
to a target.
Therefore, currently all the `arch`-like flags are centered around `Target`.
As an example:
- The [Vulkan
target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L345-L370)
comprehensively includes hardware feature support (e.g. whether or not fp16 is
supported), physical limits of the device (e.g. max number of threads allowed).
- The [CUDA
target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L292-L296)
is defined in similar approach but less comprehensive yet. It doesn't require
architectural change to grow its attributes.

Note that the number of attributes may grow if there is new hardware
feature, but to the best of my knowledge, it could be less common that those
hardware features may bloat, mainly because there is concrete cost to grow
features on hardwares.

**Target tag system.** Given the fact that existing hardware models are
enumerable, the [Target
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844)
proposes to use "tags" to allow easy creation of targets. For example,
`Target("nvidia/jetson-agx-xavier")` gives full specification of this device,
including the cuda target and the ARM host. At the time of writing, it only
takes 200 tags to describe [all the CUDA
hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348).

**What on earth are `Target`s.** Actually, `target` in TVM not only refers
to the hardware, but also the codegen targets. For example, LLVM targets means
TVM codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets
codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for
compilation.

## Discussion

**Naming choice.** When implementing the Target RFC, I came up with the
`preprocessor` to be backward compatible with existing functionalities that
auto-detects the local enviroment (e.g. cuda version), which I have to admit
it's probably not the best name. In the context of our RFC, we might want to
use names like `target_parser` instead to be more specific.

**Where to dispatch target parsers.** Currently, the preprocessor is
dispatched solely based on `TargetKind`, which is admittedly a bit limited and
overlooked the possiblity that `aarch64` and `x86` may need completely
different parsers. Therefore, here we would love to raise a question: based on
which attribute the parser should be dispatched? Previous wisdom in clang seems
to suggest dispatching according to `mtriple` (IIUC), so shall we introduce
anything similar, for example, dispatching based on `--device aarch64-foo-bar`?

**Do we need multiple target parsers?** My answer is no. If the parsers are
maintained by different vendors separately on their own interest, then they
could decide how to implement parsers for "keys", "arch", together, without
conflicting with other contributors. Therefore, I would say it's already
consistent with our principle A3 without having to develop multiple parsers.

**Function Signature for a target parser.** Note that with the introduction
of previous Target RFC, a target object is canonically represented by a
JSON-like object. Therefore, the signature could be:
```c++
using TargetJSON = Map<String, ObjectRef>;
using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>;
```
so that it's generic enough and could potentially be useful if vendors in
the future want to hack around the Target object. To clarify the discussion
[here](https://github.com/apache/tvm-rfcs/pull/71/files#r876079831), I would
propose that our parsing code path not go back to string again, i.e.:
- str -> TargetJSON (canonical form) -> TargetJSON (target parser applied)
-> Target
- TargetJSON (canonical form) -> TargetJSON (target parser applied) -> Target

**Distinguishing `arch` and `attrs`.** Note that the number of possible
attributes grow less frequently as we might expect. Also, there is no
fundamental difference between `arch` and `attrs` given `arch` is special
attrs, and target is designed for this according to point C5 in the [Target
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844).
Therefore, I would personally prefer a more flattened access pattern, and
therefore we do not need to distinguish `arch` and `attrs`.

**Folder structure.** According to principle A3, I would propose that
different vendors may maintain their own TargetKind and parsers separately, for
example, aarch64 could be maintained using:
- `src/target/target/aarch64/parser.cc`
- `src/target/target/aarch64/tags.cc`
- `src/target/target/aarch64/kind.cc`
- `src/target/target/aarch64/helpers.cc`
Where the last item provides pre-defined packed functions mentioned in the
"Motivation" section.

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] junrushao1994 commented on pull request #71: Add Target Pre-processing RFC

Reply via email to