ekalda commented on PR #104:
URL: https://github.com/apache/tvm-rfcs/pull/104#issuecomment-1847090278

   @cbalint13 @tqchen  Thank you for your input! This thread has been dormant 
for a bit, but we're still on it!
   
   > A comprehensive presentation on SVE design booth on RISCV and ARM from 
perspective of LLVM.
   The presentation captures all the design details of the SVE rationale in 
LLVM including arch comparisions.
   https://youtu.be/-ox8iJmbp0c?feature=shared (Vector Codegen / Luke Lau)
   
   Thanks for sharing this, a really nice presentation! I'm trying to think how 
RVV's features will align with this RFC proposal... I think LLVM can be a good 
source of inspiration there :) Based on my (quite basic) understanding of RVV, 
there are two features that need consideration:
   
   **1. Addressing several vectors at once (`LMUL`)**
   They have resolved it in LLVM by encoding the `LMUL` value into the 
multiplier of `vscale`. Since this proposal follows the LLVM convection in 
expressing the scalable length, it can easily be adopted in TVM. As it 
currently stands, it will be up to the schedule author to do the maths and 
figure out the correct multiplier. It would be even easier if we implemented 
both `vscale` and `vfactor`...*
   
   **2. Predication**
   If I understood it correctly, there are two ways of setting the active lanes:
     1. By providing a bitmask as a predicate to the operation - I'd expect 
LLVM RVV backend supports `llvm.get_active_lane_mask` for that purpose, so for 
this case the current proposal should work
     2. By setting the `VL` register to the number of active lanes - I suppose 
that's the feature @cbalint13  you mentioned in your previous comment? I can 
think of few options there:
          1. If it is more like a status register that will apply to several 
instructions, we can use pragmas/TensorIR block attributes
          2. If it is not an expensive operation, we can add an optional 
argument to ramps and broadcasts to indicate the active lanes
          3. The RISC-V backend in TVM should have all the information to 
translate `tir.get_active_lane_mask` to appropriate LLVM intrinsics that set 
the VL register.
   
   ### * ... implement both `vscale` and `vfactor`
   
   > Why not have both `vfactor` (abstract) concept along with `vscale` (real), 
where the `vfactor` would be a "virtual" teller of how a single true type 
`vscale` ramps ? This make the "implicit data type to be know" on one hand, and 
also would be expressive enough for "vectors with multiple vector width".
   
   Sorry I missed this before! That's a good point, I think there would be 
benefits in having both of these available. It would certainly make expressing 
multiples of vector length simpler, e.g. 
   ```
   Ramp(base, 1, 2 * vfactor)
   ```
   would imply `LMUL = 2`. If we want to keep `vfactor` as a user facing 
convenience function, we could do the translation from `vfactor` to `n * 
vscale` in the Ramp constructor so we wouldn't need to teach the TVM internal 
passes how to deal with it. 
   
   ---
   @tqchen 
   
   > Just to circle back here a bit. the main root issue is that we are using 
runtime::DataType, which is supposely being concrete through out the TIR node.
   
   > This places restrictions on what we can normally represent. A more 
comprehensive update would change the PrimExpr's field to also an object, as 
per StructInfo in the relax. That would requires bit more thinking, which 
likely can get around the issues mentioned in the thread(of passing around 
runtime::DataType which is not an object).
   
   
   I think I see what you mean and I agree, if we had something of a base type 
`Object` representing the data type that would give us much more freedom in 
expressing the compile time data type. I see how this would be a pretty 
invasive change though, I'm also not sure how this would interoperate with the 
`DLDataType` dependent runtime implementation (but I also don't know the 
runtime implementation very well).
   
   
   > I think in short term making the protocol of lanes = -1 and lanes = -8(for 
vscale(8)) may not be a bad idea. The main reason is I cannot think of another 
possible use of the lanes field other than for the SVE.
   
   I'm fine with going with this option (especially if we manage to hide the -8 
from the user) as it is probably the least invasive and sturdy option that will 
allow us to achieve our goals. @lhutton has been prototyping an additional 
field in `runtime::DataType`, but it's a bit of a can of worms (as per his post 
above).
   
   We intend to upload a draft prototype soon, then you guys will have 
something more concrete to look at :)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to