junrushao1994 opened a new pull request, #11578:
URL: https://github.com/apache/tvm/pull/11578
@vinx13 @jinhongyii and I observe a recent regression on TVM mainline:
over-simplification in `Schedule.split` leads to information loss that
negatively impacts search space generation.
**Impact.** This affects common operators like `softmax` and even simpler
reductions.
**Example.** Consider splitting a simple reduction loop:
```python
@T.prim_func
def main(
A: T.Buffer[2, "float32"],
B: T.Buffer[2, "float32"],
C: T.Buffer[(), "float32"],
) -> None:
for i in T.serial(2): # <= split `i` into `i_0` and `i_1`, where `i_0`
is a trivial loop
with T.block("C"):
k = T.axis.reduce(2, i)
with T.init():
C[()] = T.float32(1)
C[()] = T.min(C[()], A[k] / B[k])
```
Splitting loop `i` by factors `[1, 2]`, we get:
```python
@T.prim_func
def main(
A: T.Buffer[2, "float32"],
B: T.Buffer[2, "float32"],
C: T.Buffer[(), "float32"],
) -> None:
for i_0, i_1 in T.grid(1, 2):
with T.block("C"):
k = T.axis.reduce(2, i_1) # <= i_0 is not part of the binding,
so the system cannot tell if i_0 is a reduction loop
with T.init():
C[()] = T.float32(1)
C[()] = T.min(C[()], A[k] / B[k])
```
In this case, loop `i_0` will be considered as a spatial loop, even it’s the
outcome of splitting a reduction loop. However, if we change the factors from
`[1, 2]` to `[2, 1]`, loop `i_0` becomes a reduction loop. This means the loop
iteration property depends on the loop extent.
**Why is it problematic**? MetaSchedule has an assumption: extremely
seldomly, a loop extent would impact the iteration property of the loop itself,
i.e. no matter the extent is 1 or 2 or anything, the fact that the loop is a
reduction loop should rarely change.
As an example, `Auto-Bind` finds the outer `k` spatial loops, which are
fused together and bound to thread axis. In the trace, the number (`k`) of the
outer loops has to be a constant.
However, if Auto-Bind thinks there are `k=3` outer loops to fuse during
search space generation, where the last loop happens to be a reduction loop
with extent 1, as shown below:
```python
for spatial_loop_0 in range(...):
for spatial_loop_1 in range(...):
for reduction_loop in range(1): # <= Auto-Bind mistakes this loop as
spatial, because its extent is 1
```
During evolutionary search, the extent of reduction_loop will change and
become larger than 1. In this case, the binding strategy will consistently fail
because it considers fusing `k=3` loops - which means the entire search
strategy will fail with almost no valid candidates.
Thanks @Ruihang Lai for figuring out this issue, and @jinhongyii for
valuable pointers to the right fix!
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]