YuchenJin commented on a change in pull request #8825:
URL: https://github.com/apache/tvm/pull/8825#discussion_r696266194



##########
File path: tutorials/optimize/opt_gemm.py
##########
@@ -293,23 +300,26 @@
 # Allocate write cache
 CC = s.cache_write(C, "global")
 
-xo, yo, xi, yi = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
+mo, no, mi, ni = s[C].tile(C.op.axis[0], C.op.axis[1], bn, bn)
 
-# Write cache is computed at yo
-s[CC].compute_at(s[C], yo)
+# Write cache is computed at no
+s[CC].compute_at(s[C], no)
 
 # New inner axes
-xc, yc = s[CC].op.axis
+mc, nc = s[CC].op.axis
 
 (k,) = s[CC].op.reduce_axis
-ko, ki = s[CC].split(k, factor=4)
-s[CC].reorder(ko, xc, ki, yc)
+ko, ki = s[CC].split(k, factor=kfactor)
+s[CC].reorder(ko, mc, ki, nc)
+s[CC].vectorize(nc)
+
+# unroll kfactor loops
+# this is a separate optimization not discussed in this tutorial

Review comment:
       Indeed good catch! It might be useful to add a comment to briefly 
explain unroll for now, something like "unrolling is a loop optimization 
strategy which can reduce branch prediction failures and increases the chance 
of concurrent execution". I do agree it's better to discuss unroll in detail in 
the future, for example, unroll can be added to [Schedule Primitives in 
TVM](https://tvm.apache.org/docs/tutorials/language/schedule_primitives.html).
   




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to