tqchen commented on PR #94:
URL: https://github.com/apache/tvm-rfcs/pull/94#issuecomment-1264656688

   Thanks @ekalda . It is great to see us having conversations on bringing in 
SVE. The main question we want to resolve likely is going to be **what is the 
TIR spec goes into codegen that contains SVE info**.
   
   Three alternatives have been discussed so far:
   
   ### A0: Loop with annotation but body as scalar
   
   ```python
     for (i: int32, 0, 20;i, annotation={"VLA"}) {
       C_2[i] = A_2[i] + B_2[i];
     }
   ```
   ### A1: Vectorized loop with constant vector factor 
   
   ```python
     for (i: int32, 0, 20; i) {
       C_2[ramp(i, 0, 5)] = A_2[ramp(i, 0, 5)] + B_2[ramp(i, 0, 5)];
     }
   ```
   
   ### A2: Vectorized loop with some form of TIR repr for sve vector
   
   ```python
     for (i: int32, 0, 20; i) {
       C_2[ramp(i, 0, vscale)] = A_2[ramp(i, 0, vscale)] + B_2[ramp(i, 0, 
vscale)];
     }
   ```
   
   This would involve updates to the ramp note TIR. See 
```kScalableVectorLaneMark``` comment in [previous 
discussion](https://github.com/apache/tvm-rfcs/pull/18)
   
   ## Discussion
   The above three perspective are to setup the stage for discussion. This RFC 
proposes A1. 
   
   Because it is a proposed change to codegen only, which does not change TIR. 
If A1 can be implemented correctly, then it think it is a positive step(close 
to S0 type change we had in other conversations) even if we want to do things 
in several stages(with follow up S1 changes).
   
   The main question of  discussion is how can we implement A1 robustly.  
   
   Since turning a specialized code into general one is a bit like raising 
(from special case to general ones). It would be good to add high-level 
description about the pattern match and conversation rules.  For some 
background, initially I thought that there might be some traps when the code 
contains some specializations to lane, but thinking a bit more I find my 
initial thought of counter example actually is fine under A1. So I am more 
convinced of this approach. 
   
   
   Something around the following:
   
   We would only turn SVE specialization if the code satisfies the following 
pattern
   
   - Pattern match all ramped load/store `A[ramp(iter*lanes, 0, lanes)]` to 
ensure they have same lanes, change lane to VL with predication
   - Change the outer loop iter to vector loop.
   - If there is a vector/load that does not satisfy the pattern, we abort.
   
   
   
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to