masahi opened a new pull request, #13338:
URL: https://github.com/apache/tvm/pull/13338

   Currently, even if a producer block has a non-trivial predicate, 
`ReverseComputeInline` goes ahead and produces buggy a schedule. The real-world 
example I've hit was the following. After `ReverseComputeInline` the  
`compute_4` block, the predicate (`(ax1_0 * 4 + ax1_1) * 32 + ax1_2) * 2 + 
ax1_3 < 64`) disappears. See 
https://gist.github.com/masahi/01a80b86062122ad57b9b1fd785fb960 for a repro.
   ```
            ...
            for ax0, ax1_0, ax1_1, ax1_2, ax1_3 in T.grid(32, 1, 4, 32, 2):
                with T.block("conv2d_nhwc_reindex_shared"):
                    T.where(((ax1_0 * 4 + ax1_1) * 32 + ax1_2) * 2 + ax1_3 < 64)
                    v0 = T.axis.spatial(50176, ax2_0_0_ax3_0_0_fused // 4 * 
6272 + ax2_0_1_ax3_0_1_fused * 32 + ax0)
                    v1 = T.axis.spatial(256, ax2_0_0_ax3_0_0_fused % 4 * 64 + 
(ax1_0 * 256 + ax1_1 * 64 + ax1_2 * 2 + ax1_3))
                    T.reads(p7[()], conv2d_nhwc_reindex_shared[v0, v1], p2[0, 
0, 0, v1], p3[0, 0, 0, v1], p4[v1], p5[v1], p6[v1], p8[0])
                    T.writes(compute_3[v0 // 3136, v0 % 3136 // 56, v0 % 56, 
v1])
                    compute_3[v0 // 3136, v0 % 3136 // 56, v0 % 56, v1] = 
T.q_multiply_shift(T.max(T.min(p7[()] + 
T.q_multiply_shift_per_axis(conv2d_nhwc_reindex_shared[v0, v1] - p2[0, 0, 0, 
v1] + p3[0, 0, 0, v1], p4[v1], p5[v1], p6[v1], 31, False, True, dtype="int32"), 
255), 0) - p8[0], 1457846997, 31, 0, dtype="int32")
    for i0_12, i1_12, i2_12, i3_12 in T.grid(16, 56, 56, 256):
        with T.block("compute_4"):
            i0_13, i1_13, i2_13, i3_13 = T.axis.remap("SSSS", [i0_12, i1_12, 
i2_12, i3_12])
            T.reads(compute_3[i0_13, i1_13, i2_13, i3_13], p9[i0_13, i1_13, 
i2_13, i3_13])
            T.writes(compute[i0_13, i1_13, i2_13, i3_13])
            compute[i0_13, i1_13, i2_13, i3_13] = T.max(T.min(compute_3[i0_13, 
i1_13, i2_13, i3_13] + T.q_multiply_shift(p9[i0_13, i1_13, i2_13, i3_13], 
2101000910, 31, 0, dtype="int32"), 255), 0)
   ```
   
   So we should disallow `ReverseComputeInine` when the producer has a 
non-trivial predicate. But if the predicate in the new inlined block can imply 
the original predicate in the producer block, we can still allow 
`ReverseComputeInline` to be applied. These two cases are demonstrated in the 
test cases.
   
   @vinx13 @junrushao @Hzfengsy 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to