masahi opened a new pull request, #13329:
URL: https://github.com/apache/tvm/pull/13329
In anchor-block tuning, we need to manually apply `AutoInline` to some
blocks (those that are not part of the anchor subgraph). Currently the order of
blocks to apply `AutoInline` is undefined, but I've hit a case where this is
problematic.
For example, given these four blocks,
```
for i0_7, i1_7, i2_7, i3_7 in T.grid(16, 56, 56, 256):
with T.block("compute_2"):
i0_8, i1_8, i2_8, i3_8 = T.axis.remap("SSSS", [i0_7, i1_7,
i2_7, i3_7])
T.reads(T_subtract_1[i0_8, i1_8, i2_8, i3_8])
T.writes(compute_3[i0_8, i1_8, i2_8, i3_8])
compute_3[i0_8, i1_8, i2_8, i3_8] =
T.q_multiply_shift(T_subtract_1[i0_8, i1_8, i2_8, i3_8], 1457846997, 31, 0,
dtype="int32")
for i0_9, i1_9, i2_9, i3_9 in T.grid(16, 56, 56, 256):
with T.block("compute_3"):
i0_10, i1_10, i2_10, i3_10 = T.axis.remap("SSSS", [i0_9,
i1_9, i2_9, i3_9])
T.reads(p9[i0_10, i1_10, i2_10, i3_10])
T.writes(compute_4[i0_10, i1_10, i2_10, i3_10])
compute_4[i0_10, i1_10, i2_10, i3_10] =
T.q_multiply_shift(p9[i0_10, i1_10, i2_10, i3_10], 2101000910, 31, 0,
dtype="int32")
for i0_11, i1_11, i2_11, i3_11 in T.grid(16, 56, 56, 256):
with T.block("T_add_2"):
ax0, ax1, ax2, ax3 = T.axis.remap("SSSS", [i0_11, i1_11,
i2_11, i3_11])
T.reads(compute_3[ax0, ax1, ax2, ax3], compute_4[ax0, ax1,
ax2, ax3])
T.writes(T_add_2[ax0, ax1, ax2, ax3])
T_add_2[ax0, ax1, ax2, ax3] = compute_3[ax0, ax1, ax2, ax3]
+ compute_4[ax0, ax1, ax2, ax3]
for i0_12, i1_12, i2_12, i3_12 in T.grid(16, 56, 56, 256):
with T.block("compute_4"):
i0_13, i1_13, i2_13, i3_13 = T.axis.remap("SSSS", [i0_12,
i1_12, i2_12, i3_12])
T.reads(T_add_2[i0_13, i1_13, i2_13, i3_13])
T.writes(compute[i0_13, i1_13, i2_13, i3_13])
compute[i0_13, i1_13, i2_13, i3_13] =
T.max(T.min(T_add_2[i0_13, i1_13, i2_13, i3_13], 255), 0)
```
, we want to `AutoInline` "compute_3", "T_add_2" and "compute_4". If the
order is "T_add_2" -> "compute_3" -> "compute_4", all three blocks can be
inlined / reverse inlined to "compute_2". However, if the order is "T_add_2" ->
"compute_4" -> "compute_3" , "compute_4" can neither be inlined or reverse
inlined. This in turn can result in a buggy schedule to be generated (see the
description in the test case).
We can avoid this problem by always `AutoInlin`ing the last block after all
other blocks have been processed. This ensures that the last block can be
reverse inlined.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]