hogepodge commented on a change in pull request #8825:
URL: https://github.com/apache/tvm/pull/8825#discussion_r696753465
##########
File path: tutorials/optimize/opt_gemm.py
##########
@@ -293,23 +300,26 @@
# Allocate write cache
CC = s.cache_write(C, "global")
-xo, yo, xi, yi = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
+mo, no, mi, ni = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
-# Write cache is computed at yo
-s[CC].compute_at(s[C], yo)
+# Write cache is computed at no
+s[CC].compute_at(s[C], no)
# New inner axes
-xc, yc = s[CC].op.axis
+mc, nc = s[CC].op.axis
(k,) = s[CC].op.reduce_axis
-ko, ki = s[CC].split(k, factor=4)
-s[CC].reorder(ko, xc, ki, yc)
+ko, ki = s[CC].split(k, factor=kfactor)
+s[CC].reorder(ko, mc, ki, nc)
+s[CC].vectorize(nc)
+
+# unroll kfactor loops
+# this is a separate optimization not discussed in this tutorial
Review comment:
If we show it we should discuss it and explain why it's important, just
as we do for the others.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]