[GitHub] [tvm] tmoreau89 commented on a change in pull request #8825: refactor optimize GEMM on CPU tutorial

GitBox Wed, 25 Aug 2021 17:45:06 -0700


tmoreau89 commented on a change in pull request #8825:
URL: https://github.com/apache/tvm/pull/8825#discussion_r696206243




##########
File path: tutorials/optimize/opt_gemm.py
##########
@@ -227,43 +228,49 @@
 
###################################################################################################
 # Array Packing
 # -------------
-# Another important trick is array packing. This trick is to reorder the 
storage dimension of the
-# array to convert the continuous access pattern on certain dimension to a 
sequential pattern after
+# Another important trick is array packing. This trick is to reorder the 
storage dimension of an
+# array to convert the continuous access pattern on its dimensions to a 
sequential pattern after
 # flattening.
 #
 # .. image:: 
https://github.com/dmlc/web-data/raw/main/tvm/tutorial/array-packing.png
 #      :align: center
 #
+# NOTE: The figure above is meant for illustration purposes only.  Please 
ignore dimension
+# information ([16][16] and [16/4][16][4]) which does not apply to this 
tutorial.
 
 
 
###################################################################################################
-# Just as it is shown in the figure above, after blocking the computations, we 
can observe the array
-# access pattern of B (after flattening), which is regular but discontinuous. 
We expect that after
-# some transformation we can get continuous access pattern. We can reorder a 
[16][16] array to
-# a [16/4][16][4] array, so that the access pattern of B will be sequential 
when grabing
-# the corresponding value from the packed array.
-#
+# We can use array packing to address the access pattern for B. Observe the 
array access pattern of
+# B after flattening which is not sequential as we iterate over the K 
dimension. We can reorder B
+# with dimensions [K][N] so that it has dimensions [N/bn][K][bn] where bn is 
the blocking factor and
+# also the vector size for both B in the inner loop.  This reorder splits N 
into two dimensions ---
+# bigN (N/bn) and littleN (bn) --- and the new dimensions [N/bn][K][bn] match 
the indexing of B
+# from outer to inner loops (no, ko, ki, ni) resulting in a sequential access 
pattern for B after
+# flattening.
+
 
 # We have to re-write the algorithm slightly.
-packedB = te.compute((N / bn, K, bn), lambda x, y, z: B[y, x * bn + z], 
name="packedB")
+packedB = te.compute(
+    (N / bn, K, bn), lambda bigN, k, littleN: B[k, bigN * bn + littleN], 
name="packedB"
+)
 C = te.compute(
     (M, N),
-    lambda x, y: te.sum(A[x, k] * packedB[y // bn, k, tvm.tir.indexmod(y, 
bn)], axis=k),
+    lambda m, n: te.sum(A[m, k] * packedB[n // bn, k, tvm.tir.indexmod(n, 
bn)], axis=k),
     name="C",
 )
 
 s = te.create_schedule(C.op)
 
-xo, yo, xi, yi = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
+mo, no, mi, ni = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
 (k,) = s[C].op.reduce_axis
-ko, ki = s[C].split(k, factor=4)
+ko, ki = s[C].split(k, factor=kfactor)
 
-s[C].reorder(xo, yo, ko, xi, ki, yi)
-s[C].vectorize(yi)
+s[C].reorder(mo, no, ko, mi, ki, ni)
+s[C].vectorize(ni)
 
-x, y, z = s[packedB].op.axis
-s[packedB].vectorize(z)
-s[packedB].parallel(x)
+bigN, k, littleN = s[packedB].op.axis

Review comment:
       ```suggestion
   bigN, _, littleN = s[packedB].op.axis
   ```
   I don't think that `k` is used further. Plus we wouldn't want to confuse it 
with the reduction axis defined in the compute definition, see line 101. WDYT 
@adstraw 




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] tmoreau89 commented on a change in pull request #8825: refactor optimize GEMM on CPU tutorial

Reply via email to