junrushao1994 commented on PR #71: URL: https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1133541236
@Mousius Thank you so much for your response! This makes lots of sense to me! Also, thanks for including my personal principles in the discussion! It's my personal principles which are completely okay to disagree with :-) > I'm not sure why we consider that pollution given it should have a positive impact on all groups and aids in providing customisation through a well defined route established by all vendors, providing simplicity and customisation (A1, A2 and A3). Sorry my words may lead to some potential miscommunication. As I mentioned in the very beginning, I completely agree with the idea of handling `arch` more systematically and it's important to us, so in no means I consider it pollution. I mean the fragmentation itself is not good but hasn't become extremely bad right now, because they are not used globally at infrastructural level, so there is no namespace polluting. > I believe this is still held given the crossover between hardware and code generators, for c/llvm you will still be able to pass mattr/mcpu/etc which are the standard way to specify CPU, ISA and extensions in existing compilers. Agreed. It is designed to be consistent with existing wisdom, i.e. LLVM/GCC mattr/mcpu/etc., so I believe it's not controversial. > There's definitely an existing amount of confusion in the Target system, but I think it's even more confused than this by looking at the tagged entries, such as: .... Which defines a system configuration that then uses a code generator llvm with a specific CPU profile, therefore the Target system can represent at minimum 3 distinct layers: systems/hardware/code generators. Yes, I think your understanding is correct. Per [Target RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844), this object represents host compiler info (C0), heterogeneous compilation settings (C1), compiler behavior control (C2), additional ISA/library (C4). Therefore, the tag system (C4) is introduced to reduce the cognitive overhead, because the end users can use a simple plain string tag `Target("raspberry-pi/4b-aarch64")` to represent all C0/1/2/4, instead of understanding every detail of the hardware themselves (A2). In other words, to the end users, the vision of having tag system is that they only need a simple string to do configure TVM codegen on their their hardware. > where-as the cuda Target has a more complete reflection of the attrs available for CUDA. These inconsistencies make it difficult for a user to approach TVM and that isn't requiring a developers to learn the bare minimum (A2). It also diverges from other compilers where you'd expect mcpu/mtriple/etc to infer much of this information for you (A1). Right, in fact I share the same confusion with you with it comes to CUDA/Vulkan/etc., where (to best of my knowledge) there is no precedent of mcpu/mattr/mtriple/etc (except for LLVM NVPTX backend). Certainly, we should develop unified approach to address this confusion, which could possibly be the `arch` proposed in the RFC. > I don't mind using one parser, I was just making it more granular and avoiding stepping on peoples toes with the existing pre-processor. Either design seems to fulfil your principle of customisability (A3), more granularly means you only have to implement a small piece of customisation where-as with the single parser it requires some thought as to how to cater for all attributes within a given TargetKind parser. Thanks for sharing your thoughts! Definitely. The issue of having multiple parsers is that the order of their application is less explicit, so I would be a little bit leaning towards a single one, while the implementation could share common helper functions. > > Where to dispatch target parsers. Currently, the preprocessor is dispatched solely based on TargetKind, which is admittedly a bit limited and overlooked the possiblity that aarch64 and x86 may need completely different parsers. Therefore, here we would love to raise a question: based on which attribute the parser should be dispatched? Previous wisdom in clang seems to suggest dispatching according to mtriple (IIUC), so shall we introduce anything similar, for example, dispatching based on --device aarch64-foo-bar? > > This requires developers to learn more than the bare minimum (A2), as users are required to learn the specifics of how to specify a Target in TVM. If an llvm Target supports -mcpu/-mattr/-mtriple then all of these fields can be reasonably expected to allow the Target to infer features given the relation to GCC/LLVM (A1). Right. I agree. `TargetKind`-based dispatch is certainly the clearest approach as you mentioned (A2). The reason I'm thinking about (not 100% sure though) other directions is: what if different vendors (say Qualcomm and ARM) want to customize their own target parsing logic separately, while their `TargetKind` is the same `llvm`? Admittedly I don't have a perfect answer for this and would love to hear about your thoughts. > Based on LLVM, the Features inferred from a given CPU/GPU profile can be extensive already, which would lead to a drastic increase in attrs if we were to track them this way. ... Therefore I would strongly advise following the pattern (as in LLVM) of using a separate structure for this, clear boundaries provide a much more straight forward developer experience (A2) and allow us to do things as @Lunderberg suggested like plucking just features or arch to give to Passes rather than the entire Target. Thank you for detailing the challenges! I'm convinced that `target.features` may be the best way to go. A couple of more questions: - What's the difference between `target.features` and `target.arch`? If they are the same thing, I would say `features` is certainly a better name - IIUC we are not going to introduce 2-level of indirection like `target.arch.features`. Is that correct? - Is it possible to completely replace `target.attrs` with more structured configuration? For example, `target.features` for C4 in the Target RFC, while `target.hardware_limits` for maximum shared memory, etc (just a hypothetical name, please feel free to suggest better ones) > I believe the src/target/parsers folder would provide for maximum customisation given parsers can be re-used between TargetKinds and therefore serves as a good starting point? Also happy for an RFC in 6 months to move everything to a new folder if we re-factor the Targets 😸 Sounds good to me! Overall, I think we are in broad agreement, and thanks for sharing all the context! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
