tqchen commented on PR #94:
URL: https://github.com/apache/tvm-rfcs/pull/94#issuecomment-1264656688
Thanks @ekalda . It is great to see us having conversations on bringing in
SVE. The main question we want to resolve likely is going to be **what is the
TIR spec goes into codegen that contains SVE info**.
Three alternatives have been discussed so far:
### A0: Loop with annotation but body as scalar
```python
for (i: int32, 0, 20;i, annotation={"VLA"}) {
C_2[i] = A_2[i] + B_2[i];
}
```
### A1: Vectorized loop with constant vector factor
```python
for (i: int32, 0, 20; i) {
C_2[ramp(i, 0, 5)] = A_2[ramp(i, 0, 5)] + B_2[ramp(i, 0, 5)];
}
```
### A2: Vectorized loop with some form of TIR repr for sve vector
```python
for (i: int32, 0, 20; i) {
C_2[ramp(i, 0, vscale)] = A_2[ramp(i, 0, vscale)] + B_2[ramp(i, 0,
vscale)];
}
```
This would involve updates to the ramp note TIR. See
```kScalableVectorLaneMark``` comment in [previous
discussion](https://github.com/apache/tvm-rfcs/pull/18)
## Discussion
The above three perspective are to setup the stage for discussion. This RFC
proposes A1.
Because it is a proposed change to codegen only, which does not change TIR.
If A1 can be implemented correctly, then it think it is a positive step(close
to S0 type change we had in other conversations) even if we want to do things
in several stages(with follow up S1 changes).
The main question of discussion is how can we implement A1 robustly.
Since turning a specialized code into general one is a bit like raising
(from special case to general ones). It would be good to add high-level
description about the pattern match and conversation rules. For some
background, initially I thought that there might be some traps when the code
contains some specializations to lane, but thinking a bit more I find my
initial thought of counter example actually is fine under A1. So I am more
convinced of this approach.
Something around the following:
We would only turn SVE specialization if the code satisfies the following
pattern
- Pattern match all ramped load/store `A[ramp(iter*lanes, 0, lanes)]` to
ensure they have same lanes, change lane to VL with predication
- Change the outer loop iter to vector loop.
- If there is a vector/load that does not satisfy the pattern, we abort.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]