[GitHub] [tvm-rfcs] Mousius commented on pull request #71: Add Target Pre-processing RFC

GitBox Fri, 20 May 2022 05:59:35 -0700


Mousius commented on PR #71:
URL: https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132872196


   Hi @junrushao1994, thanks for the elaborate reply 😸  I don't want to debate 
our personal principles but I appreciate you sharing them and will reference 
them where I can.
   
   > **Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific 
helper functions, mentioned in the "Motivation" section, are developed by 
concurrent efforts from multiple parties, and therefore I would say 
fragmentation is almost a certain thing to happen. On the other hand, 
fragmentation of those helper functions, which are indeed ad-hoc, is currently 
confined locally as independent checks without yet polluting the design of 
global infrastructure.
   
   Yip, this fragmentation occurs due to a lack of a standardised mechanism for 
these groups to use, which the RFC aims to provide - I'm not sure why we 
consider that pollution given it should have a positive impact on all groups 
and aids in providing customisation through a well defined route established by 
all vendors, providing simplicity and customisation (A1, A2 and A3).
   
   > Existing arch-like attributes. Note that the design of Target attributes 
in TargetKind is intended to describe the capability of hardware, according to 
the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844):
   
   I believe this is still held given the crossover between hardware and code 
generators, for `c`/`llvm` you will still be able to pass `mattr`/`mcpu`/etc 
which are the standard way to specify CPU, ISA and extensions in existing 
compilers (A1). This is also following the TVM Guideline of "Be consistent with 
existing well-known package’s APIs if the features overlap.
     For example, tensor operation APIs should always be consistent with the 
numpy API.".
   
   > Target tag system. Given the fact that existing hardware models are 
enumerable, the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844) 
proposes to use "tags" to allow easy creation of targets. For example, 
Target("nvidia/jetson-agx-xavier") gives full specification of this device, 
including the cuda target and the ARM host. At the time of writing, it only 
takes 200 tags to describe [all the CUDA 
hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348).
   
   > **What on earth are `Target`s.** Actually, `target` in TVM not only refers 
to the hardware, but also the codegen targets. For example, LLVM targets means 
TVM codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets 
codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for 
compilation.
   
   There's definitely an existing amount of confusion in the `Target` system, 
but I think it's even more confused than this by looking at the tagged entries, 
such as:
   
   ```
   TVM_REGISTER_TARGET_TAG("raspberry-pi/4b-aarch64")
       .set_config({{"kind", String("llvm")},
                    {"mtriple", String("aarch64-linux-gnu")},
                    {"mcpu", String("cortex-a72")},
                    {"mattr", Array<String>{"+neon"}},
                    {"num-cores", Integer(4)},
                    {"host", Map<String, ObjectRef>{{"kind", String("llvm")},
                                                    {"mtriple", 
String("aarch64-linux-gnu")},
                                                    {"mcpu", 
String("cortex-a72")},
                                                    {"mattr", 
Array<String>{"+neon"}},
                                                    {"num-cores", 
Integer(4)}}}});
   ```
   
   Which defines a system configuration that then uses a code generator `llvm` 
with a specific CPU profile, therefore the `Target` system can represent **at 
minimum** 3 distinct layers: systems/hardware/code generators. Given the 
principle of having to understand a minimal amount (A2), `Target` needs to be 
streamlined into understandable parts. This is a digression from the actual 
RFC, as the features are pre-processed from the tagged `Target`s information.
   
   The problem that I've faced trying to figure out how to implement this is 
that `llvm` takes `mcpu`/`mattr`/etc which are used to infer features later 
where-as the `cuda` `Target` has a more complete reflection of the `attrs` 
available for CUDA. These inconsistencies make it difficult for a user to 
approach TVM and that isn't requiring a developers to learn the bare minimum 
(A2). It also diverges from other compilers where you'd expect 
`mcpu`/`mtriple`/etc to infer much of this information for you (A1).
   
   > Do we need multiple target parsers? My answer is no. If the parsers are 
maintained by different vendors separately on their own interest, then they 
could decide how to implement parsers for "keys", "arch", together, without 
conflicting with other contributors. Therefore, I would say it's already 
consistent with our principle A3 without having to develop multiple parsers.
   
   > Naming choice. When implementing the Target RFC, I came up with the 
preprocessor to be backward compatible with existing functionalities that 
auto-detects the local enviroment (e.g. cuda version), which I have to admit 
it's probably not the best name. In the context of our RFC, we might want to 
use names like target_parser instead to be more specific.
   
   I don't mind using one parser, I was just making it more granular and 
avoiding stepping on peoples toes with the existing pre-processor. Either 
design seems to fulfil your principle of customisability (A3), more granularly 
means you only have to implement a small piece of customisation where-as with 
the single parser it requires some thought as to how to cater for all 
attributes within a given `TargetKind` parser.
   
   > **Where to dispatch target parsers.** Currently, the preprocessor is 
dispatched solely based on `TargetKind`, which is admittedly a bit limited and 
overlooked the possiblity that `aarch64` and `x86` may need completely 
different parsers. Therefore, here we would love to raise a question: based on 
which attribute the parser should be dispatched? Previous wisdom in clang seems 
to suggest dispatching according to `mtriple` (IIUC), so shall we introduce 
anything similar, for example, dispatching based on `--device aarch64-foo-bar`?
   
   This requires developers to learn more than the bare minimum (A2), as users 
are required to learn the specifics of how to specify a `Target` in TVM. If an 
`llvm` `Target` supports `-mcpu`/`-mattr`/`-mtriple` then all of these fields 
can be reasonably expected to allow the `Target` to infer features given the 
relation to GCC/LLVM (A1). 
   
   > Distinguishing arch and attrs. Note that the number of possible attributes 
grow less frequently as we might expect. Also, there is no fundamental 
difference between arch and attrs given arch is special attrs, and target is 
designed for this according to point C5 in the [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844). 
Therefore, I would personally prefer a more flattened access pattern, and 
therefore we do not need to distinguish arch and attrs.
   
   Based on LLVM, the `Features` inferred from a given CPU/GPU profile can be 
extensive already, which would lead to a drastic increase in `attrs` if we were 
to track them this way. For example: 
   
   * 
https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/X86TargetParser.cpp
   * 
https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/AArch64TargetParser.cpp
   
   If we were to attach all of these directly to `Target` (i.e. `llvm`), that 
would drastically increase the number of available fields and in all cases only 
a subset would be used - specific to a given CPU/GPU profile. Therefore I would 
strongly advise following the pattern (as in LLVM) of using a separate 
structure for this, clear boundaries provide a much more straight forward 
developer experience (A2) and allow us to do things as @Lunderberg suggested 
like plucking just `features` or `arch` to give to `Pass`es rather than the 
entire `Target`.
   
   > **Folder structure.** According to principle A3, I would propose that 
different vendors may maintain their own TargetKind and parsers separately, for 
example, aarch64 could be maintained using:
   > 
   > * `src/target/target/aarch64/parser.cc`
   > * `src/target/target/aarch64/tags.cc`
   > * `src/target/target/aarch64/kind.cc`
   > * `src/target/target/aarch64/helpers.cc`
   >   Where the last item provides pre-defined packed functions mentioned in 
the "Motivation" section.
   
   I believe the proposed layout also conforms to A3 as it already details two 
different pre-processors, and you can easily imagine it extending:
   
   * src/target/preprocessors/aarch64.cc
   * src/target/preprocessors/cpu.cc
   * src/target/preprocessors/cuda.cc
   * src/target/preprocessors/woofles.cc
   * src/target/preprocessors/x86_64.cc
   
   The composition of `CPU` calling `AArch64` or `x86_64` conditionally also 
provides multiple vendors to provide pre-processing functionality to `llvm` by 
virtue of being able to call their own functions within the central `CPU` 
pre-processor.
   
   # Outcomes
   Will try and summarise the above into changes in the RFC which I can 
implement 😸 
   
   ## Single Parser
   I can re-word the RFC to use a single parser (`target_parser`) rather than 
the three separate ones, using the syntax you suggested:
   
   ```c++
   using TargetJSON = Map<String, ObjectRef>;
   using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>;
   ```
   
   ## `Target` Field
   I'm strongly in favour of having `target.features` to represent this, do you 
actively object to this?
   
   ## Location
   I believe the `src/target/parsers` folder would provide for maximum 
customisation given parsers can be re-used between `TargetKind`s and therefore 
serves as a good starting point? Also happy for an RFC in 6 months to move 
everything to a new folder if we re-factor the `Target`s 😸 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm-rfcs] Mousius commented on pull request #71: Add Target Pre-processing RFC

Reply via email to