junrushao1994 commented on PR #11428:
URL: https://github.com/apache/tvm/pull/11428#issuecomment-1175750797

   so let me briefly introduce the workflow of Compute-Buffer-Region first, and 
then let’s think about how to make it perfectly fit in our usecase.
   
   On each use-site of a buffer, we calculate the region it is touched by the 
loops above it, and then by unionizing those regions, it’s made possible to 
trim out those regions that are not touched.
   
   As a simplest example, if the original size of a buffer in shared memory is 
`[1024, 1024]`, but only `[bx : bx + 10, by : by + 20]` is touched in a 
threadblock, then the buffer size could shrink to `[10, 20]` which is 
significantly smaller.
   
   Then to provide a slightly more complicated example, if the buffer is 
touched in two different places, where the region are `[bx : bx + 10 : by : by 
+ 20]`, `[bx + 10: bx + 20, by : by + 30]`, then we calculate the union of the 
two regions, i.e. `[bx : bx + 20, by : by + 30]`, and then do shrinking so that 
its size of `[20, 30]`.
   
   The problem this PR aims to address is that some buffer’s size actually 
depends on irrelevant outer variables, for example, the shared memory size to 
be `[0, 960 - max(bx // 18 * 128, 832)]`, where `bx` is the extent of 
`blockIdx.x`. However, it takes an approach to over-relax the region when 
seeing any undesired variables (in our case, it’s `bx`), which is less optimal.
   
   The actual solution should be: calculating the region union bound first, and 
then remove the undesired variables with something similar to analyzer's 
`const_int_bound` (note that we don’t want the bound to be strictly constant)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to