junrushao1994 commented on PR #71:
URL: https://github.com/apache/tvm-rfcs/pull/71#issuecomment-1133541236

   @Mousius Thank you so much for your response! This makes lots of sense to 
me! Also, thanks for including my personal principles in the discussion! It's 
my personal principles which are completely okay to disagree with :-)
   
   > I'm not sure why we consider that pollution given it should have a 
positive impact on all groups and aids in providing customisation through a 
well defined route established by all vendors, providing simplicity and 
customisation (A1, A2 and A3).
   
   Sorry my words may lead to some potential miscommunication. As I mentioned 
in the very beginning, I completely agree with the idea of handling `arch` more 
systematically and it's important to us, so in no means I consider it 
pollution. I mean the fragmentation itself is not good but hasn't become 
extremely bad right now, because they are not used globally at infrastructural 
level, so there is no namespace polluting.
   
   > I believe this is still held given the crossover between hardware and code 
generators, for c/llvm you will still be able to pass mattr/mcpu/etc which are 
the standard way to specify CPU, ISA and extensions in existing compilers.
   
   Agreed. It is designed to be consistent with existing wisdom, i.e. LLVM/GCC 
mattr/mcpu/etc., so I believe it's not controversial.
   
   > There's definitely an existing amount of confusion in the Target system, 
but I think it's even more confused than this by looking at the tagged entries, 
such as: .... Which defines a system configuration that then uses a code 
generator llvm with a specific CPU profile, therefore the Target system can 
represent at minimum 3 distinct layers: systems/hardware/code generators.
   
   Yes, I think your understanding is correct. Per [Target 
RFC](https://discuss.tvm.apache.org/t/rfc-tvm-target-specification/6844), this 
object represents host compiler info (C0), heterogeneous compilation settings 
(C1), compiler behavior control (C2), additional ISA/library (C4). Therefore, 
the tag system (C4) is introduced to reduce the cognitive overhead, because the 
end users can use a simple plain string tag `Target("raspberry-pi/4b-aarch64")` 
to represent all C0/1/2/4, instead of understanding every detail of the 
hardware themselves (A2). In other words, to the end users, the vision of 
having tag system is that they only need a simple string to do configure TVM 
codegen on their their hardware.
   
   > where-as the cuda Target has a more complete reflection of the attrs 
available for CUDA. These inconsistencies make it difficult for a user to 
approach TVM and that isn't requiring a developers to learn the bare minimum 
(A2). It also diverges from other compilers where you'd expect mcpu/mtriple/etc 
to infer much of this information for you (A1).
   
   Right, in fact I share the same confusion with you with it comes to 
CUDA/Vulkan/etc., where (to best of my knowledge) there is no precedent of 
mcpu/mattr/mtriple/etc (except for LLVM NVPTX backend). Certainly, we should 
develop unified approach to address this confusion, which could possibly be the 
`arch` proposed in the RFC.
   
   > I don't mind using one parser, I was just making it more granular and 
avoiding stepping on peoples toes with the existing pre-processor. Either 
design seems to fulfil your principle of customisability (A3), more granularly 
means you only have to implement a small piece of customisation where-as with 
the single parser it requires some thought as to how to cater for all 
attributes within a given TargetKind parser.
   
   Thanks for sharing your thoughts! Definitely. The issue of having multiple 
parsers is that the order of their application is less explicit, so I would be 
a little bit leaning towards a single one, while the implementation could share 
common helper functions.
   
   > > Where to dispatch target parsers. Currently, the preprocessor is 
dispatched solely based on TargetKind, which is admittedly a bit limited and 
overlooked the possiblity that aarch64 and x86 may need completely different 
parsers. Therefore, here we would love to raise a question: based on which 
attribute the parser should be dispatched? Previous wisdom in clang seems to 
suggest dispatching according to mtriple (IIUC), so shall we introduce anything 
similar, for example, dispatching based on --device aarch64-foo-bar?
   >
   > This requires developers to learn more than the bare minimum (A2), as 
users are required to learn the specifics of how to specify a Target in TVM. If 
an llvm Target supports -mcpu/-mattr/-mtriple then all of these fields can be 
reasonably expected to allow the Target to infer features given the relation to 
GCC/LLVM (A1).
   
   Right. I agree. `TargetKind`-based dispatch is certainly the clearest 
approach as you mentioned (A2). The reason I'm thinking about (not 100% sure 
though) other directions is: what if different vendors (say Qualcomm and ARM) 
want to customize their own target parsing logic separately, while their 
`TargetKind` is the same `llvm`? Admittedly I don't have a perfect answer for 
this and would love to hear about your thoughts.
   
   > Based on LLVM, the Features inferred from a given CPU/GPU profile can be 
extensive already, which would lead to a drastic increase in attrs if we were 
to track them this way. ... Therefore I would strongly advise following the 
pattern (as in LLVM) of using a separate structure for this, clear boundaries 
provide a much more straight forward developer experience (A2) and allow us to 
do things as @Lunderberg suggested like plucking just features or arch to give 
to Passes rather than the entire Target.
   
   Thank you for detailing the challenges! I'm convinced that `target.features` 
may be the best way to go. A couple of more questions:
   - What's the difference between `target.features` and `target.arch`? If they 
are the same thing, I would say `features` is certainly a better name
   - IIUC we are not going to introduce 2-level of indirection like 
`target.arch.features`. Is that correct?
   - Is it possible to completely replace `target.attrs` with more structured 
configuration? For example, `target.features` for C4 in the Target RFC, while 
`target.hardware_limits` for maximum shared memory, etc (just a hypothetical 
name, please feel free to suggest better ones)
   
   > I believe the src/target/parsers folder would provide for maximum 
customisation given parsers can be re-used between TargetKinds and therefore 
serves as a good starting point? Also happy for an RFC in 6 months to move 
everything to a new folder if we re-factor the Targets 😸
   
   Sounds good to me!
   
   Overall, I think we are in broad agreement, and thanks for sharing all the 
context!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to