[GitHub] [tvm] masahi opened a new pull request, #13329: [MetaSchedule] Fix the order of applying `AutoInline` in `ScheduleUsingAnchorTrace`

GitBox Wed, 09 Nov 2022 00:42:57 -0800


masahi opened a new pull request, #13329:
URL: https://github.com/apache/tvm/pull/13329


   In anchor-block tuning, we need to manually apply `AutoInline` to some 
blocks (those that are not part of the anchor subgraph). Currently the order of 
blocks to apply `AutoInline` is undefined, but I've hit a case where this is 
problematic. 
   
   For example, given these four blocks,
   ```
           for i0_7, i1_7, i2_7, i3_7 in T.grid(16, 56, 56, 256):
               with T.block("compute_2"):
                   i0_8, i1_8, i2_8, i3_8 = T.axis.remap("SSSS", [i0_7, i1_7, 
i2_7, i3_7])
                   T.reads(T_subtract_1[i0_8, i1_8, i2_8, i3_8])
                   T.writes(compute_3[i0_8, i1_8, i2_8, i3_8])
                   compute_3[i0_8, i1_8, i2_8, i3_8] = 
T.q_multiply_shift(T_subtract_1[i0_8, i1_8, i2_8, i3_8], 1457846997, 31, 0, 
dtype="int32")
           for i0_9, i1_9, i2_9, i3_9 in T.grid(16, 56, 56, 256):
               with T.block("compute_3"):
                   i0_10, i1_10, i2_10, i3_10 = T.axis.remap("SSSS", [i0_9, 
i1_9, i2_9, i3_9])
                   T.reads(p9[i0_10, i1_10, i2_10, i3_10])
                   T.writes(compute_4[i0_10, i1_10, i2_10, i3_10])
                   compute_4[i0_10, i1_10, i2_10, i3_10] = 
T.q_multiply_shift(p9[i0_10, i1_10, i2_10, i3_10], 2101000910, 31, 0, 
dtype="int32")
           for i0_11, i1_11, i2_11, i3_11 in T.grid(16, 56, 56, 256):
               with T.block("T_add_2"):
                   ax0, ax1, ax2, ax3 = T.axis.remap("SSSS", [i0_11, i1_11, 
i2_11, i3_11])
                   T.reads(compute_3[ax0, ax1, ax2, ax3], compute_4[ax0, ax1, 
ax2, ax3])
                   T.writes(T_add_2[ax0, ax1, ax2, ax3])
                   T_add_2[ax0, ax1, ax2, ax3] = compute_3[ax0, ax1, ax2, ax3] 
+ compute_4[ax0, ax1, ax2, ax3]
           for i0_12, i1_12, i2_12, i3_12 in T.grid(16, 56, 56, 256):
               with T.block("compute_4"):
                   i0_13, i1_13, i2_13, i3_13 = T.axis.remap("SSSS", [i0_12, 
i1_12, i2_12, i3_12])
                   T.reads(T_add_2[i0_13, i1_13, i2_13, i3_13])
                   T.writes(compute[i0_13, i1_13, i2_13, i3_13])
                   compute[i0_13, i1_13, i2_13, i3_13] = 
T.max(T.min(T_add_2[i0_13, i1_13, i2_13, i3_13], 255), 0)
   ```
   , we want to `AutoInline` "compute_3", "T_add_2" and "compute_4". If the 
order is "T_add_2" -> "compute_3" -> "compute_4", all three blocks can be 
inlined / reverse inlined to "compute_2". However, if the order is "T_add_2" -> 
"compute_4" -> "compute_3" , "compute_4" can neither be inlined or reverse 
inlined. This in turn can result in a buggy schedule to be generated (see the 
description in the test case).
   
   We can avoid this problem by always `AutoInlin`ing the last block after all 
other blocks have been processed. This ensures that the last block can be 
reverse inlined. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi opened a new pull request, #13329: [MetaSchedule] Fix the order of applying `AutoInline` in `ScheduleUsingAnchorTrace`

Reply via email to