junrushao1994 commented on PR #71:
URL: https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132579480

   Thanks @Mousius for drafing this RFC!
   
   First of all, I completely agree on the importance to handle `arch`-specific 
checks. Use our experience as an example, on CUDA we might want to check if the 
PTX intrinsic `cp.async.commit_group` is available on certain architecture 
before tensorizing using that PTX intrinsic, and the existing approach is to 
add extra flags in `TargetKind`.
   
   To motivate discussion, I would love to provide more context on the system 
design, and then talk about some specific points I noticed in the current RFC, 
and finally propose some ideas inspired by this RFC.
   
   ## Background
   
   **Design principles.** Just wanted to share some of my design principles 
when developing TVM, the compiler I believe that aims to work across hardware 
models:
   - A1. Generic. By design, TVM serves the common interest of all vendors. 
Therefore, it would be great if we could discuss broadly how a design choice 
would benefit all hardware platforms, including ARM, NV, Intel, Qual, etc.
   - A2. Minimal. Ideally, practitioners are supposed to learn only bare 
minimum to participate in TVM development.
     - Take TIR as an example, for a developer who knows the IR and passes to 
use and develop TIR, the only extra concept is `Target` .
     - Indeed Relay is in a less ideal state, but Relax is designed to address 
this issue so I wouldn't worry too much
   - A3. Customizable. We want to develop the infrastructure to provide 
customization opportunities for all vendors to independently work on their 
extension in a collaborative way, making TVM a platform for everyone. A few 
examples:
     - TIR scheduling is heading towards this direction that all the schedule 
primitives are decoupled with each other, so that vendors could develop their 
own primitives without affecting each other.
     - TVMScript is going to be re-designed this way treating TIR and Relax as 
independent dialects while the core infra itself supports any 3rdparty IR.
     - Admittedly Relay is much limited with the assumption of Relay => TE => 
TIR lowering path, with some hooks to hack in other compilation paths, but 
Relax is going to change this so again I wouldn't worry...
   
   **Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific 
helper functions, mentioned in the "Motivation" section, are developed by 
concurrent efforts from multiple parties, and therefore I would say 
fragmentation is almost a certain thing to happen. On the other hand, 
fragmentation of those helper functions, which are indeed ad-hoc, is currently 
confined locally as independent checks without yet polluting the design of 
global infrastructure.
   
   **Special target attributes.** Below are a few existing target attributes 
that serve special semantics, the design of which I don't fully agree with, but 
now are preserved as legacy:
   - **keys.** This is used to guide TOPI dispatch. For example, "llvm 
--keys=arm_cpu,cpu" first finds if there is any `GenericFunc` registered with 
`arm_cpu`, and if not, it falls back to `cpu`.
   - **libs.** This is used in TOPI to dispatch to vendor library. For example, 
"cuda -libs=cudnn" prefers dispatching to cuDNN.
   - **device.** and **model.** These two sometimes control the behavior of 
auto-tuning.
   
   **Existing `arch`-like attributes.** Note that the design of `Target` 
attributes in `TargetKind` is intended to describe the capability of hardware, 
according to the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844):
   > Pass in additional target-specific attributes like ISA/library extension 
to a target.
   Therefore, currently all the `arch`-like flags are centered around `Target`. 
As an example:
   - The [Vulkan 
target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L345-L370)
 comprehensively includes hardware feature support (e.g. whether or not fp16 is 
supported), physical limits of the device (e.g. max number of threads allowed).
   - The [CUDA 
target](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/target_kind.cc#L292-L296)
 is defined in similar approach but less comprehensive yet. It doesn't require 
architectural change to grow its attributes.
   
   Note that the number of attributes may grow if there is new hardware 
feature, but to the best of my knowledge, it could be less common that those 
hardware features may bloat, mainly because there is concrete cost to grow 
features on hardwares.
   
   **Target tag system.** Given the fact that existing hardware models are 
enumerable, the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844) 
proposes to use "tags" to allow easy creation of targets. For example, 
`Target("nvidia/jetson-agx-xavier")` gives full specification of this device, 
including the cuda target and the ARM host. At the time of writing, it only 
takes 200 tags to describe [all the CUDA 
hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348).
   
   **What on earth are `Target`s.** Actually, `target` in TVM not only refers 
to the hardware, but also the codegen targets. For example, LLVM targets means 
TVM codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets 
codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for 
compilation.
   
   ## Discussion
   
   **Naming choice.** When implementing the Target RFC, I came up with the 
`preprocessor` to be backward compatible with existing functionalities that 
auto-detects the local enviroment (e.g. cuda version), which I have to admit 
it's probably not the best name. In the context of our RFC, we might want to 
use names like `target_parser` instead to be more specific.
   
   **Where to dispatch target parsers.** Currently, the preprocessor is 
dispatched solely based on `TargetKind`, which is admittedly a bit limited and 
overlooked the possiblity that `aarch64` and `x86` may need completely 
different parsers. Therefore, here we would love to raise a question: based on 
which attribute the parser should be dispatched? Previous wisdom in clang seems 
to suggest dispatching according to `mtriple` (IIUC), so shall we introduce 
anything similar, for example, dispatching based on `--device aarch64-foo-bar`?
   
   **Do we need multiple target parsers?** My answer is no. If the parsers are 
maintained by different vendors separately on their own interest, then they 
could decide how to implement parsers for "keys", "arch", together, without 
conflicting with other contributors. Therefore, I would say it's already 
consistent with our principle A3 without having to develop multiple parsers.
   
   **Function Signature for a target parser.** Note that with the introduction 
of previous Target RFC, a target object is canonically represented by a 
JSON-like object. Therefore, the signature could be:
   ```c++
   using TargetJSON = Map<String, ObjectRef>;
   using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>;
   ```
   so that it's generic enough and could potentially be useful if vendors in 
the future want to hack around the Target object. To clarify the discussion 
[here](https://github.com/apache/tvm-rfcs/pull/71/files#r876079831), I would 
propose that our parsing code path not go back to string again, i.e.:
   - str -> TargetJSON (canonical form) -> TargetJSON (target parser applied) 
-> Target
   - TargetJSON (canonical form) -> TargetJSON (target parser applied) -> Target
   
   **Distinguishing `arch` and `attrs`.** Note that the number of possible 
attributes grow less frequently as we might expect. Also, there is no 
fundamental difference between `arch` and `attrs` given `arch` is special 
attrs, and target is designed for this according to point C5 in the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844). 
Therefore, I would personally prefer a more flattened access pattern, and 
therefore we do not need to distinguish `arch` and `attrs`.
   
   **Folder structure.** According to principle A3, I would propose that 
different vendors may maintain their own TargetKind and parsers separately, for 
example, aarch64 could be maintained using:
   - `src/target/target/aarch64/parser.cc`
   - `src/target/target/aarch64/tags.cc`
   - `src/target/target/aarch64/kind.cc`
   - `src/target/target/aarch64/helpers.cc`
   Where the last item provides pre-defined packed functions mentioned in the 
"Motivation" section.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to