wrongtest-intellif opened a new pull request, #12757:
URL: https://github.com/apache/tvm/pull/12757
The pass `PlanAndUpdateBufferAllocationLocation` seems to have problem when
the buffer accessed indices take a loop carried dependency. As an example,
```python
@T.prim_func
def test(A: T.Buffer[(8, 8), "int32"], B: T.Buffer[(8, 8), "int32"]):
C = T.alloc_buffer([8, 8], "int32")
for i in range(8):
for j in range(8):
with T.block("b0"):
vi = T.axis.spatial(8, i)
vj = T.axis.spatial(8, j)
C[vi, vj] = A[vi, vj] + vi
for j in range(8):
with T.block("b1"):
vi = T.axis.opaque(8, i)
vj = T.axis.spatial(8, j)
B[vi, vj] = C[vi, vj] + T.if_then_else(vi > 0, C[vi - 1,
vj], vi, dtype="int32")
```
The block `b1`'s read access to intermediate buffer `B` on iteration `i`,
depends `b0` write of `B` on both `i` and `i-1`, thus we should not put
allocation of `B` under loop `i`, which is the LCA position of current plan
strategy.
To fix the issue we change the behavior of `DetectBufferLCA` to be aware of
opaque block iters (loop carried dependency and other more complex behaviors
are categorized as `opaque` in iter type annotation).
It enforce that every legal "ancestor" of buffer accesses should dominate
all loops relates to accessed opaque block iters within buffer indices. Eg,
since `vi` is opaque, the loop `i` must be under the planned allocation point.
As an interesting workload related to loop carried dependency, refer to
https://discuss.tvm.apache.org/t/rfc-introducing-a-rolling-buffer-scheduling-primitive/9836,
where the intermediate result of previous iteration is try best to get reused.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]