Mousius commented on PR #71: URL: https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1132872196
Hi @junrushao1994, thanks for the elaborate reply 😸 I don't want to debate our personal principles but I appreciate you sharing them and will reference them where I can. > **Current `arch`-specifc checks.** Most of the 6 groups of `arch`-specific helper functions, mentioned in the "Motivation" section, are developed by concurrent efforts from multiple parties, and therefore I would say fragmentation is almost a certain thing to happen. On the other hand, fragmentation of those helper functions, which are indeed ad-hoc, is currently confined locally as independent checks without yet polluting the design of global infrastructure. Yip, this fragmentation occurs due to a lack of a standardised mechanism for these groups to use, which the RFC aims to provide - I'm not sure why we consider that pollution given it should have a positive impact on all groups and aids in providing customisation through a well defined route established by all vendors, providing simplicity and customisation (A1, A2 and A3). > Existing arch-like attributes. Note that the design of Target attributes in TargetKind is intended to describe the capability of hardware, according to the [Target RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844): I believe this is still held given the crossover between hardware and code generators, for `c`/`llvm` you will still be able to pass `mattr`/`mcpu`/etc which are the standard way to specify CPU, ISA and extensions in existing compilers (A1). This is also following the TVM Guideline of "Be consistent with existing well-known package’s APIs if the features overlap. For example, tensor operation APIs should always be consistent with the numpy API.". > Target tag system. Given the fact that existing hardware models are enumerable, the [Target RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844) proposes to use "tags" to allow easy creation of targets. For example, Target("nvidia/jetson-agx-xavier") gives full specification of this device, including the cuda target and the ARM host. At the time of writing, it only takes 200 tags to describe [all the CUDA hardware](https://github.com/apache/tvm/blob/a6a34046c432b3766e7c32bbd85c098812a12a68/src/target/tag.cc#L107-L348). > **What on earth are `Target`s.** Actually, `target` in TVM not only refers to the hardware, but also the codegen targets. For example, LLVM targets means TVM codegens to LLVM, and let LLVM do the rest of compilation. CUDA targets codegens to CUDA source code (i.e. `*.cu` files), and invokes NVCC for compilation. There's definitely an existing amount of confusion in the `Target` system, but I think it's even more confused than this by looking at the tagged entries, such as: ``` TVM_REGISTER_TARGET_TAG("raspberry-pi/4b-aarch64") .set_config({{"kind", String("llvm")}, {"mtriple", String("aarch64-linux-gnu")}, {"mcpu", String("cortex-a72")}, {"mattr", Array<String>{"+neon"}}, {"num-cores", Integer(4)}, {"host", Map<String, ObjectRef>{{"kind", String("llvm")}, {"mtriple", String("aarch64-linux-gnu")}, {"mcpu", String("cortex-a72")}, {"mattr", Array<String>{"+neon"}}, {"num-cores", Integer(4)}}}}); ``` Which defines a system configuration that then uses a code generator `llvm` with a specific CPU profile, therefore the `Target` system can represent **at minimum** 3 distinct layers: systems/hardware/code generators. Given the principle of having to understand a minimal amount (A2), `Target` needs to be streamlined into understandable parts. This is a digression from the actual RFC, as the features are pre-processed from the tagged `Target`s information. The problem that I've faced trying to figure out how to implement this is that `llvm` takes `mcpu`/`mattr`/etc which are used to infer features later where-as the `cuda` `Target` has a more complete reflection of the `attrs` available for CUDA. These inconsistencies make it difficult for a user to approach TVM and that isn't requiring a developers to learn the bare minimum (A2). It also diverges from other compilers where you'd expect `mcpu`/`mtriple`/etc to infer much of this information for you (A1). > Do we need multiple target parsers? My answer is no. If the parsers are maintained by different vendors separately on their own interest, then they could decide how to implement parsers for "keys", "arch", together, without conflicting with other contributors. Therefore, I would say it's already consistent with our principle A3 without having to develop multiple parsers. > Naming choice. When implementing the Target RFC, I came up with the preprocessor to be backward compatible with existing functionalities that auto-detects the local enviroment (e.g. cuda version), which I have to admit it's probably not the best name. In the context of our RFC, we might want to use names like target_parser instead to be more specific. I don't mind using one parser, I was just making it more granular and avoiding stepping on peoples toes with the existing pre-processor. Either design seems to fulfil your principle of customisability (A3), more granularly means you only have to implement a small piece of customisation where-as with the single parser it requires some thought as to how to cater for all attributes within a given `TargetKind` parser. > **Where to dispatch target parsers.** Currently, the preprocessor is dispatched solely based on `TargetKind`, which is admittedly a bit limited and overlooked the possiblity that `aarch64` and `x86` may need completely different parsers. Therefore, here we would love to raise a question: based on which attribute the parser should be dispatched? Previous wisdom in clang seems to suggest dispatching according to `mtriple` (IIUC), so shall we introduce anything similar, for example, dispatching based on `--device aarch64-foo-bar`? This requires developers to learn more than the bare minimum (A2), as users are required to learn the specifics of how to specify a `Target` in TVM. If an `llvm` `Target` supports `-mcpu`/`-mattr`/`-mtriple` then all of these fields can be reasonably expected to allow the `Target` to infer features given the relation to GCC/LLVM (A1). > Distinguishing arch and attrs. Note that the number of possible attributes grow less frequently as we might expect. Also, there is no fundamental difference between arch and attrs given arch is special attrs, and target is designed for this according to point C5 in the [Target RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844). Therefore, I would personally prefer a more flattened access pattern, and therefore we do not need to distinguish arch and attrs. Based on LLVM, the `Features` inferred from a given CPU/GPU profile can be extensive already, which would lead to a drastic increase in `attrs` if we were to track them this way. For example: * https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/X86TargetParser.cpp * https://github.com/llvm/llvm-project/blob/09c2b7c35af8c4bad39f03e9f60df8bd07323028/llvm/lib/Support/AArch64TargetParser.cpp If we were to attach all of these directly to `Target` (i.e. `llvm`), that would drastically increase the number of available fields and in all cases only a subset would be used - specific to a given CPU/GPU profile. Therefore I would strongly advise following the pattern (as in LLVM) of using a separate structure for this, clear boundaries provide a much more straight forward developer experience (A2) and allow us to do things as @Lunderberg suggested like plucking just `features` or `arch` to give to `Pass`es rather than the entire `Target`. > **Folder structure.** According to principle A3, I would propose that different vendors may maintain their own TargetKind and parsers separately, for example, aarch64 could be maintained using: > > * `src/target/target/aarch64/parser.cc` > * `src/target/target/aarch64/tags.cc` > * `src/target/target/aarch64/kind.cc` > * `src/target/target/aarch64/helpers.cc` > Where the last item provides pre-defined packed functions mentioned in the "Motivation" section. I believe the proposed layout also conforms to A3 as it already details two different pre-processors, and you can easily imagine it extending: * src/target/preprocessors/aarch64.cc * src/target/preprocessors/cpu.cc * src/target/preprocessors/cuda.cc * src/target/preprocessors/woofles.cc * src/target/preprocessors/x86_64.cc The composition of `CPU` calling `AArch64` or `x86_64` conditionally also provides multiple vendors to provide pre-processing functionality to `llvm` by virtue of being able to call their own functions within the central `CPU` pre-processor. # Outcomes Will try and summarise the above into changes in the RFC which I can implement 😸 ## Single Parser I can re-word the RFC to use a single parser (`target_parser`) rather than the three separate ones, using the syntax you suggested: ```c++ using TargetJSON = Map<String, ObjectRef>; using FTVMTargetParser = TypedPackedFunc<TargetJSON(TargetJSON)>; ``` ## `Target` Field I'm strongly in favour of having `target.features` to represent this, do you actively object to this? ## Location I believe the `src/target/parsers` folder would provide for maximum customisation given parsers can be re-used between `TargetKind`s and therefore serves as a good starting point? Also happy for an RFC in 6 months to move everything to a new folder if we re-factor the `Target`s 😸 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
