[GitHub] [tvm] adstraw commented on a change in pull request #8825: refactor optimize GEMM on CPU tutorial

GitBox Thu, 26 Aug 2021 08:30:23 -0700


adstraw commented on a change in pull request #8825:
URL: https://github.com/apache/tvm/pull/8825#discussion_r696743930




##########
File path: tutorials/optimize/opt_gemm.py
##########
@@ -227,43 +228,49 @@
 
###################################################################################################
 # Array Packing
 # -------------
-# Another important trick is array packing. This trick is to reorder the 
storage dimension of the
-# array to convert the continuous access pattern on certain dimension to a 
sequential pattern after
+# Another important trick is array packing. This trick is to reorder the 
storage dimension of an
+# array to convert the continuous access pattern on its dimensions to a 
sequential pattern after
 # flattening.
 #
 # .. image:: 
https://github.com/dmlc/web-data/raw/main/tvm/tutorial/array-packing.png
 #      :align: center
 #
+# NOTE: The figure above is meant for illustration purposes only.  Please 
ignore dimension
+# information ([16][16] and [16/4][16][4]) which does not apply to this 
tutorial.
 
 
 
###################################################################################################
-# Just as it is shown in the figure above, after blocking the computations, we 
can observe the array
-# access pattern of B (after flattening), which is regular but discontinuous. 
We expect that after
-# some transformation we can get continuous access pattern. We can reorder a 
[16][16] array to
-# a [16/4][16][4] array, so that the access pattern of B will be sequential 
when grabing
-# the corresponding value from the packed array.
-#
+# We can use array packing to address the access pattern for B. Observe the 
array access pattern of
+# B after flattening which is not sequential as we iterate over the K 
dimension. We can reorder B
+# with dimensions [K][N] so that it has dimensions [N/bn][K][bn] where bn is 
the blocking factor and
+# also the vector size for both B in the inner loop.  This reorder splits N 
into two dimensions ---
+# bigN (N/bn) and littleN (bn) --- and the new dimensions [N/bn][K][bn] match 
the indexing of B
+# from outer to inner loops (no, ko, ki, ni) resulting in a sequential access 
pattern for B after
+# flattening.
+
 
 # We have to re-write the algorithm slightly.
-packedB = te.compute((N / bn, K, bn), lambda x, y, z: B[y, x * bn + z], 
name="packedB")
+packedB = te.compute(
+    (N / bn, K, bn), lambda bigN, k, littleN: B[k, bigN * bn + littleN], 
name="packedB"
+)
 C = te.compute(
     (M, N),
-    lambda x, y: te.sum(A[x, k] * packedB[y // bn, k, tvm.tir.indexmod(y, 
bn)], axis=k),
+    lambda m, n: te.sum(A[m, k] * packedB[n // bn, k, tvm.tir.indexmod(n, 
bn)], axis=k),
     name="C",
 )
 
 s = te.create_schedule(C.op)
 
-xo, yo, xi, yi = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
+mo, no, mi, ni = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
 (k,) = s[C].op.reduce_axis
-ko, ki = s[C].split(k, factor=4)
+ko, ki = s[C].split(k, factor=kfactor)
 
-s[C].reorder(xo, yo, ko, xi, ki, yi)
-s[C].vectorize(yi)
+s[C].reorder(mo, no, ko, mi, ki, ni)
+s[C].vectorize(ni)
 
-x, y, z = s[packedB].op.axis
-s[packedB].vectorize(z)
-s[packedB].parallel(x)
+bigN, k, littleN = s[packedB].op.axis

Review comment:
       Looks like we only need the definition of `k` on line 101.  We don't 
need to re-define `k` with `(k,) = s[C].op.reduce_axis` anywhere in the script 
including line 265.  I am able to remove all such lines and the tutorial still 
functions with no issue.  That is, provided I don't redefine `k` when I grab 
the axes from `packedB` as @tmoreau89 mentions above.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] adstraw commented on a change in pull request #8825: refactor optimize GEMM on CPU tutorial

Reply via email to