junrushao1994 commented on PR #11428: URL: https://github.com/apache/tvm/pull/11428#issuecomment-1175750797
so let me briefly introduce the workflow of Compute-Buffer-Region first, and then let’s think about how to make it perfectly fit in our usecase. On each use-site of a buffer, we calculate the region it is touched by the loops above it, and then by unionizing those regions, it’s made possible to trim out those regions that are not touched. As a simplest example, if the original size of a buffer in shared memory is `[1024, 1024]`, but only `[bx : bx + 10, by : by + 20]` is touched in a threadblock, then the buffer size could shrink to `[10, 20]` which is significantly smaller. Then to provide a slightly more complicated example, if the buffer is touched in two different places, where the region are `[bx : bx + 10 : by : by + 20]`, `[bx + 10: bx + 20, by : by + 30]`, then we calculate the union of the two regions, i.e. `[bx : bx + 20, by : by + 30]`, and then do shrinking so that its size of `[20, 30]`. The problem this PR aims to address is that some buffer’s size actually depends on irrelevant outer variables, for example, the shared memory size to be `[0, 960 - max(bx // 18 * 128, 832)]`, where `bx` is the extent of `blockIdx.x`. However, it takes an approach to over-relax the region when seeing any undesired variables (in our case, it’s `bx`), which is less optimal. The actual solution should be: calculating the region union bound first, and then remove the undesired variables with something similar to analyzer's `const_int_bound` (note that we don’t want the bound to be strictly constant) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
