YuchenJin commented on a change in pull request #8825:
URL: https://github.com/apache/tvm/pull/8825#discussion_r696266194
##########
File path: tutorials/optimize/opt_gemm.py
##########
@@ -293,23 +300,26 @@
# Allocate write cache
CC = s.cache_write(C, "global")
-xo, yo, xi, yi = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
+mo, no, mi, ni = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
-# Write cache is computed at yo
-s[CC].compute_at(s[C], yo)
+# Write cache is computed at no
+s[CC].compute_at(s[C], no)
# New inner axes
-xc, yc = s[CC].op.axis
+mc, nc = s[CC].op.axis
(k,) = s[CC].op.reduce_axis
-ko, ki = s[CC].split(k, factor=4)
-s[CC].reorder(ko, xc, ki, yc)
+ko, ki = s[CC].split(k, factor=kfactor)
+s[CC].reorder(ko, mc, ki, nc)
+s[CC].vectorize(nc)
+
+# unroll kfactor loops
+# this is a separate optimization not discussed in this tutorial
Review comment:
Indeed good catch! It might be useful to add a comment to briefly
explain unroll for now, something like "unrolling is a loop optimization
strategy which can reduce branch prediction failures and increases the chance
of concurrent execution". I do agree it's better to discuss unroll in detail in
the future, for example, unroll can be added to [Schedule Primitives in
TVM](https://tvm.apache.org/docs/tutorials/language/schedule_primitives.html).
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]