echuraev commented on code in PR #14519:
URL: https://github.com/apache/tvm/pull/14519#discussion_r1170860570


##########
python/tvm/topi/arm_cpu/depthwise_conv2d.py:
##########
@@ -394,7 +394,8 @@ def schedule_conv_out(out):
             ci_outer, ci_inner = s[out].split(ci, 4)
             s[out].vectorize(ci_inner)
             s[out].unroll(ci_outer)
-
+        else:
+            s[out].vectorize(ci)

Review Comment:
   This split will work only for run w/o tuning statistic (in default mode). I 
suggest restricting the search space in tuning by adding filter value in 
`define_split`. It will help to improve tuning time and dropout useless 
configurations. 
   By the way, I reread your message and noticed that probably I was wrong than 
suggest to restrict vectorization size by 32. You wrote that it is storing 
32bit floats, that means that the maximum possible capacity for vectorization 
is 8, am I right?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to