FranklandJack commented on code in PR #14483:
URL: https://github.com/apache/tvm/pull/14483#discussion_r1158807348


##########
python/tvm/topi/arm_cpu/conv2d_spatial_pack.py:
##########
@@ -316,12 +317,23 @@ def _tile_size(axis, candidates):
                     return candidate
             return 1
 
-        # Tile size 8 results in efficient vectorization for these schedules.
-        # If the axis is not divisible by 8, try 4
+        # For data tensors with unity height and width we can leave it to the
+        # backend to vectorize the inner loop. This has been observed to be 
more
+        # performant on SVE targets with a vector width > 128bits.
+        target = Target.current(allow_none=False)
+        if target.features.has_sve and OW == OH and OW == 1:

Review Comment:
   Good question, not that has been observed. In the cases we saw the LLVM 
backend choose the optimal vectorization (fixed length vs. scalable) depending 
on the number of output channels. If we were forcing the use of scalable 
vectors though, I think you are right we would need a heuristic here :) 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to