wrongtest-intellif opened a new pull request, #14184:
URL: https://github.com/apache/tvm/pull/14184
In `compute-at` primitive currently, when estimate iteration domain to cover
required buffer region, we use limited heuristic rules. The change try to
improve using existing arith tool `InverseAffineIterMap`. Then theoretically,
all single buffer access which is of bijective mapping from block iter vars are
solvable.
A new test case describe that if a block represents arbitary
transpose/reshape ops, `compute-at` this block would always expect to success.
```python
@T.prim_func
def NCHW16c_to_NCHW8c(A: T.Buffer((1, 3, 5, 5, 16), "float32"), C:
T.Buffer((1, 6, 5, 5, 8), "float32")):
B = T.alloc_buffer((1, 3, 5, 5, 16))
for i0, i1, i2, i3, i4 in T.grid(1, 3, 5, 5, 16):
with T.block("compute"):
v_i0, v_i1, v_i2, v_i3, v_i4 = T.axis.remap("SSSSS", [i0, i1,
i2, i3, i4])
B[v_i0, v_i1, v_i2, v_i3, v_i4] = A[v_i0, v_i1, v_i2, v_i3,
v_i4] + T.float32(1)
for ax0, ax1, ax2, ax3, ax4 in T.grid(1, 6, 5, 5, 8):
with T.block("T_layout_trans"):
v_ax0, v_ax1, v_ax2, v_ax3, v_ax4 = T.axis.remap("SSSSS", [ax0,
ax1, ax2, ax3, ax4])
C[v_ax0, v_ax1, v_ax2, v_ax3, v_ax4] = B[v_ax0, (v_ax1 * 8 +
v_ax4) // 16, v_ax2, v_ax3, (v_ax1 * 8 + v_ax4) % 16]
```
Also fix one issue when the iter scale is -1.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]