masahi commented on pull request #7935:
URL: https://github.com/apache/tvm/pull/7935#issuecomment-828718501


   I'm planning to work on improving our GPU scan kernel using warp shuffle 
instructions, so I want to investigate this issue when I get there. Warp 
shuffle on AMD being slower than shared memory sounds surprising and counter 
intuitive. In the PR that introduced warp shuffle support to TVM rocm, 
https://github.com/apache/tvm/pull/5727, @t-vi mentioned that he got a good 
speed up on softmax reduction 
https://github.com/apache/tvm/pull/5727#issuecomment-639109441. So I was under 
impression that warp shuffle is generally a good thing on AMD too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to