[GitHub] [tvm] Lunderberg commented on a diff in pull request #12720: [TIR] Implement API for padded layout transformations

GitBox Wed, 14 Sep 2022 08:59:02 -0700


Lunderberg commented on code in PR #12720:
URL: https://github.com/apache/tvm/pull/12720#discussion_r971016392



##########
python/tvm/tir/tensor_intrin/cuda.py:
##########
@@ -36,7 +36,7 @@ def shared_16x32_to_ldmatrix_32x16_layout(i, j):
 
 
 def shared_32x16_to_ldmatrix_32x16_layout(i, j):
-    thread_id = (i % 4) + 4 * (j % 8)
+    thread_id = (i % 16) // 4 + 4 * (j % 8)

Review Comment:
   Thank you for looking into it!  I wasn't able to find any tests that 
explicitly validate the transform (e.g. use the transform to generate data in a 
specific layout, then pass through the mma), as all the tests either started 
with transformed data, only used the 16x16 shape, or replaced everything with 
the tensor intrinsic.
   
   I had put together [this standalone 
test](https://gist.github.com/Lunderberg/0c2a44de34e7e2a1d149c37b2a112f91) to 
convince myself on it.  The main issue with the current index map is that it 
doesn't map to unique locations (512 input indices map to 128 output indices).  
It only arose as an issue in this PR, because it generates the inverse in order 
to determine whether/where padding is required.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] Lunderberg commented on a diff in pull request #12720: [TIR] Implement API for padded layout transformations

Reply via email to