Lunderberg commented on pull request #39:
URL: https://github.com/apache/tvm-rfcs/pull/39#issuecomment-961358086
Following a video chat discussion with @vinx13 , we touched on a number of
points, summarized below. Also, we are adding @vinx13 as a co-author on this
RFC.
- Are there cases where the flattening in `StorageFlatten`/`FlattenBuffer`
should be inferred from buffer properties, rather than explicitly specified by
the user? For example, if a buffer has `"texture"` scope, then we know it must
be flattened to a 2-d buffer. We concluded that this wouldn't be possible,
because the number of resulting dimensions isn't sufficient to define the
flattening being applied. For example, if a 4-d buffer is being flattened to
2-d for use in texture memory, the four initial axes `[A, B, C, D]` could be
flattened to `[A, fuse(B,C,D)]`, `[fuse(A,B), fuse(C,D)]`, or `[fuse(A,B,C),
D]`, without any clear method that is better or worse.
- How will buffer layout transformations be represented in TensorIR
schedules? `buffer_transform` will be a primitive transformation in TensorIR,
which is eagerly applied on the TensorIR computation.
- In all cases, this would rewrite the buffer shape, and would rewrite
loads/stores of that buffer.
- If these loads/stores occur within a series of nested loops that cover
all values of the buffer, and have no additional computation (e.g. cache
read/write) in the body of these loops, then the loops will be rewritten to be
along the transformed axes. can write remainder of schedule in terms of the
transformed axes. Otherwise, rewriting the loops would not be well-defined,
and will not be done.
- The recommendation for use will be to apply the layout transformations
prior to any other scheduling passes that could impact the loop structure, so
that rewriting of the loops is possible.
- Should buffer flattening be implemented as a special case of layout
transformation? Buffer flattening should remain a separate concept from the
layout transforms. Where all other layout transformations can be performed
eagerly, and should be before other scheduling passes, buffer flattening must
be performed after other scheduling passes. If it were done eagerly, other
passes wouldn't have sufficient information about the structure of the buffer.
- Is deprecating Store/Load acceptable, instead using BufferStore/BufferLoad
throughout all lowering steps? Yes, as this gives a single uniform way to
access buffers, regardless of the lowering step. The one concern is that we
should port all existing functionality. For example, the vload/vstore methods
in Buffer, which currently return Load/Store respectively, should not be
removed, and instead should be updated to return flattened
BufferLoad/BufferStore.
- RampNode should be treated as a compiler internal, and shouldn't be easily
constructible by users as indices into buffers. The preferred method to
represent vectorized access is to have a buffer access within a vectorized
loop, then allow `tir.transform.VectorizeLoop` to insert the RampNode. This
matches previous behavior, where RampNode could occur in flattened Store/Load,
while BufferLoad/BufferStore avoided RampNodes to maintain easy analysis of
accessed locations.
- Passes that change buffer dimensionality (e.g. InjectDoubleBuffer) should
either be moved before the StorageFlatten/FlattenBuffer, or should be rewritten
to instead resize the buffer, rather than changing the dimensionaltiy. The
former would require the pass to also update the axis separators to be used
when flattening.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]