roastduck opened a new pull request #5307: [TIR] Make lower_warp_memory support extent(threadIdx.x) < warp_size URL: https://github.com/apache/incubator-tvm/pull/5307 Pass `lower_warp_memory` lowers memory bound to "warp" scope into the warp shuffle intrinsic. Currently, this pass only supports the situation where the extent of `threadIdx.x` equals to the warp size. However, CUDA's `__shfl` has a 3rd parameter `width` to shuffle variables in half (or 1/4, 1/8, 1/16) of a warp. This PR uses this extra parameter to enable Pass `lower_warp_memory` when the extent of `threadIdx.x` is less than the warp size. Changes: 1. Add a 3rd parameter `width` and a 4th parameter `warp_size` to TVM intrinsic `tvm_warp_shuffle`. The 4th parameter `warp_size` is used to help a Code Generator to decide whether a `width` is legal. For example, the OpenCL backend dose not support the `width` parameter, so it has to check whether `width == warp_size`. Since currently `lower_warp_memory` is the only pass that utilize `tvm_warp_shuffle`, this change will not break any dependencies. 2. Code Generators that lowers `tvm_warp_shuffle` are modified. Currently, the only two affected Code Generators are CUDA and OpenCL. 3. In `lower_warp_memory`, find the value of `width` first, and then alter the IR base on `width`, instead of based on `warp_size`. Then, it generate the modified `tvm_warp_shuffle` intrinsic. 4. A test which runs `lower_warp_memory` with 1/2 warp size is added. Can @tqchen, @ZihengJiang or @ajtulloch make a review or suggest any other reviewers?
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services
