quic-sanirudh commented on PR #14161:
URL: https://github.com/apache/tvm/pull/14161#issuecomment-1451610544
> Hi @quic-sanirudh , this schedule allocates a buffer with the same volume
as the Cartesian product of block itervar domains, regardless of the original
buffer size, and will not perform the check because those data are not used in
current block.
Thanks for the reply @yzh119. If it does not perform correctness checks and
generates the loop based on the surrounding itervar domains, then couldn't that
lead to potentially incorrect code?
For example: Applying `reindex_cache_read` on the below modification of your
initial example would probably lead to incorrect code:
```python
@T.prim_func
def func(a: T.handle, b: T.handle, c: T.handle) -> None:
A = T.match_buffer(a, (129, 129))
B = T.match_buffer(b, (128, 128))
C = T.match_buffer(c, (129, 129))
for i, j in T.grid(128, 128):
with T.block("B"):
vi, vj = T.axis.remap("SS", [i, j])
B[vi, vj] = A[vi + 1, vj + 1] * 2.0
for i, j in T.grid(129, 129):
with T.block("C"):
vi, vj = T.axis.remap("SS", [i, j])
C[vi, vj] = A[vi, vj] * 3.0
```
When we apply `sch.reindex_cache_read("B", 0, "shared", lambda i, j: (j,
i))`, we get the below IR:
```python
# from tvm.script import ir as I
# from tvm.script import tir as T
@I.ir_module
class Module:
@T.prim_func
def main(A: T.Buffer((129, 129), "float32"), B: T.Buffer((128, 128),
"float32"), C: T.Buffer((129, 129), "float32")):
# with T.block("root"):
A_shared = T.alloc_buffer((128, 128), scope="shared")
for i, j in T.grid(128, 128):
with T.block("A_shared"):
vi, vj = T.axis.remap("SS", [i, j])
T.reads(A[vi + 1, vj + 1])
T.writes(A_shared[vj, vi])
A_shared[vj, vi] = A[vi + 1, vj + 1]
for i, j in T.grid(128, 128):
with T.block("B"):
vi, vj = T.axis.remap("SS", [i, j])
T.reads(A_shared[vj, vi])
T.writes(B[vi, vj])
B[vi, vj] = A_shared[vj, vi] * T.float32(2)
vj_1 = T.int32()
vi_1 = T.int32()
for i, j in T.grid(129, 129):
with T.block("C"):
vi, vj = T.axis.remap("SS", [i, j])
T.reads(A_shared[vj_1, vi_1])
T.writes(C[vi, vj])
C[vi, vj] = A_shared[vj_1, vi_1] * T.float32(3)
```
Note that the iteration domain of block `"A_shared"` is `(128, 128)`, but
when `A_shared` is used in block `"C"` the iteration domain is `(129,129)`.
This means the values of `C` from `C[0,0]` to `C[0,128]` and `C[i,0]` for `0 <=
i <= 128` would be invalid.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]