[GitHub] [tvm] masahi commented on pull request #7935: [SPARSE] Improve sparse performance on ROCM

GitBox Tue, 27 Apr 2021 17:09:40 -0700


masahi commented on pull request #7935:
URL: https://github.com/apache/tvm/pull/7935#issuecomment-828042725



   This post says: "They (`ds_permute` and `ds_bpermute` instructions) use LDS 
hardware to route data between the 64 lanes of a wavefront, but they don’t 
actually write to an LDS location"
   https://gpuopen.com/learn/amd-gcn-assembly-cross-lane-operations/
   
   I wonder if both approaches use shared memory, why the explicit way as in 
this PR is faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi commented on pull request #7935: [SPARSE] Improve sparse performance on ROCM

Reply via email to