junrushao1994 opened a new pull request, #11578:
URL: https://github.com/apache/tvm/pull/11578

   @vinx13 @jinhongyii and I observe a recent regression on TVM mainline: 
over-simplification in `Schedule.split` leads to information loss that 
negatively impacts search space generation.
   
   **Impact.** This affects common operators like `softmax` and even simpler 
reductions.
   
   **Example.** Consider splitting a simple reduction loop:
   
   ```python
   @T.prim_func
   def main(
       A: T.Buffer[2, "float32"],
       B: T.Buffer[2, "float32"],
       C: T.Buffer[(), "float32"],
   ) -> None:
       for i in T.serial(2):   # <= split `i` into `i_0` and `i_1`, where `i_0` 
is a trivial loop
           with T.block("C"):
               k = T.axis.reduce(2, i)
               with T.init():
                   C[()] = T.float32(1)
               C[()] = T.min(C[()], A[k] / B[k])
   ```
   
   Splitting loop `i`  by factors `[1, 2]`, we get:
   
   ```python
   @T.prim_func
   def main(
       A: T.Buffer[2, "float32"],
       B: T.Buffer[2, "float32"],
       C: T.Buffer[(), "float32"],
   ) -> None:
       for i_0, i_1 in T.grid(1, 2):
           with T.block("C"):
               k = T.axis.reduce(2, i_1)   # <= i_0 is not part of the binding, 
so the system cannot tell if i_0 is a reduction loop
               with T.init():
                   C[()] = T.float32(1)
               C[()] = T.min(C[()], A[k] / B[k])
   ```
   
   In this case, loop `i_0` will be considered as a spatial loop, even it’s the 
outcome of splitting a reduction loop. However, if we change the factors from 
`[1, 2]` to `[2, 1]`, loop `i_0` becomes a reduction loop. This means the loop 
iteration property depends on the loop extent.
   
   **Why is it problematic**? MetaSchedule has an assumption: extremely 
seldomly, a loop extent would impact the iteration property of the loop itself, 
i.e. no matter the extent is 1 or 2 or anything, the fact that the loop is a 
reduction loop should rarely change.
   
   As an example, `Auto-Bind` finds the outer `k` spatial loops, which are 
fused together and bound to thread axis. In the trace, the number (`k`) of the 
outer loops has to be a constant.
   
   However, if Auto-Bind thinks there are `k=3` outer loops to fuse during 
search space generation, where the last loop happens to be a reduction loop 
with extent 1, as shown below:
   
   ```python
   for spatial_loop_0 in range(...):
     for spatial_loop_1 in range(...):
       for reduction_loop in range(1):  # <= Auto-Bind mistakes this loop as 
spatial, because its extent is 1
   ```
   
   During evolutionary search, the extent of reduction_loop will change and 
become larger than 1. In this case, the binding strategy will consistently fail 
because it considers fusing `k=3` loops - which means the entire search 
strategy will fail with almost no valid candidates.
   
   Thanks @Ruihang Lai for figuring out this issue, and @jinhongyii for 
valuable pointers to the right fix!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to