tkonolige opened a new pull request #7935:
URL: https://github.com/apache/tvm/pull/7935


   The current sparse dense gpu kernel uses warp level storage to handling 
caching of data. Warp level storage uses shuffle intrinsics, which are slow on 
rocm (because they actually read and write to shared memory). Rocm does provide 
intrinsics to do the correct memory management, but they are not available 
through tvm. Instead this PR switches to using shared memory on rocm devices. 
Performance is about 2x faster.
   
   @tmoreau89 @jwfromm 
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to