sjoerdmeijer commented on pull request #18:
URL: https://github.com/apache/tvm-rfcs/pull/18#issuecomment-917576240
Thanks for commenting @tqchen
Could you further clarify a few things for me please? See remarks inlined.
> Thanks @MeeraN7 . Yes I get what you mean. Right now we are adding a
"is_scalable" field to indicate that the broadcast and ramp are "context
dependent" on VL. Additionally, we might need to update DataType to indicate a
scalable data type.
>
> This context dependency is the missing information I mentioned here.
I don't think I understand what you mean by context dependency. Basically,
in my view of the world, the Ramp node means that we can process elements in a
data parallel way. How exactly this is done, is up to backends depending on the
architecture and the code. What we are doing here is annotating this Ramp node
with a bit of state, which is a hint that we want a special kind of
vectorisation. From this point of view it is syntactic sugar: if the hint is
dropped or ignored, it could still be vectorised, but not in a vector length
agnostic way.
I don't think we need to encode much more information than one boolean that
specifies the vectorisation style. You could indeed argue that this is ad-hoc,
but if we are going to do it in a different way, we would still need to keep a
bit of state around, like the `annotation={"VLA"}` example that you gave
earlier. From that point of view, I don't see any difference.
> The set of code is really undefined and should be parameterized by VL.
What do you mean by undefined here?
> Additionally, considering the case of two loops with different VL1 and VL2
and want to do some transformations, we might fall into the trap of thinking
them as same type(because only "is_scalable" is marked) but in reality they are
not, as a implicit dependency on VL can be ignored.
>
> I can understand that the additional flag can be helpful as we could reuse
some of the vectorization logic. However, the "is_scalable" field might
introduce additional confusion as above, and the additional ramp node may not
carry too much additional information(apart from the fact that we use a scalar
vs a vector type). So my main question is that whether or not we could use a
separate normal form to hint the code generator without changing the current
DataType, ramp and broadcast.
Correct me if I am wrong, but your assumption is that explicit scalable
state is bad. Looking at your example:
> **N1: A possible loop normal form via annotation**
>
> ```
> for (i: int32, 0, 17;i, annotation={"VLA"}) {
> C_2[i] = A_2[i] + B_2[i];
> }
> ```
This annotation looks equivalent to a loop pragma. In Clang you can for
example do:
```
#pragma loop vectorize_width(4, scalable)
for (I =0; I<17; ++i) {
C_2[i] = A_2[i] + B_2[I];
}
```
and thus request scalable vectorisation. If this annotation results in
scalable vectorisation, the LLVM vectoriser will lower this to operations using
`<n x 4 x i32>` scalable vector IR types.
What I would like to say with these examples, is that there are 2 things
going on at different levels. I think:
- The `annotation={"VLA"}` corresponds to a loop pragma in Clang,
- The TIR Ramp node extension corresponds to LLVM IR scalable types, e.g.
`<n x 4 x i32>`.
And I think these 2 concepts are different things that both have their value
and place.
I we can only annotate loops as being scalable, we loose the finer grained
control to request this on a statement level. I don't know if mixing fixed and
scalable will be an important use-case, but I think it is possible.
Summarising, I don't think explicit encoding of scalable in TIR nodes is a
bad thing, the opposite actually, I think we need it, and the annotation on the
loop might be a complementary technique to this.
What do you think?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]