masahi commented on pull request #7935: URL: https://github.com/apache/tvm/pull/7935#issuecomment-828718501
I'm planning to work on improving our GPU scan kernel using warp shuffle instructions, so I want to investigate this issue when I get there. Warp shuffle on AMD being slower than shared memory sounds surprising and counter intuitive. In the PR that introduced warp shuffle support to TVM rocm, https://github.com/apache/tvm/pull/5727, @t-vi mentioned that he got a good speed up on softmax reduction https://github.com/apache/tvm/pull/5727#issuecomment-639109441. So I was under impression that warp shuffle is generally a good thing on AMD too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
