[GitHub] [tvm] masahi edited a comment on pull request #7935: [SPARSE] Improve sparse performance on ROCM

GitBox Tue, 27 Apr 2021 18:04:29 -0700


masahi edited a comment on pull request #7935:
URL: https://github.com/apache/tvm/pull/7935#issuecomment-828042725



   This post says: "They (`ds_permute` and `ds_bpermute` instructions) use LDS 
hardware to route data between the 64 lanes of a wavefront, but they don’t 
actually write to an LDS location". I don't know what they mean by "route 
without actually writing".
   https://gpuopen.com/learn/amd-gcn-assembly-cross-lane-operations/
   
   I wonder if both approaches use shared memory, why the explicit way as in 
this PR is faster.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi edited a comment on pull request #7935: [SPARSE] Improve sparse performance on ROCM

Reply via email to