Anndrey24 commented on code in PR #16899:
URL: https://github.com/apache/tvm/pull/16899#discussion_r1572475838


##########
python/tvm/topi/arm_cpu/conv2d_gemm.py:
##########
@@ -478,23 +498,21 @@ def schedule_conv2d_gemm_native(cfg, s, out, final_out):
         s[C].tensorize(y_inner, gemm_acc)
         s[C].parallel(x_outer)
     else:
-        k_outer, k_inner = s[C].split(k, 4)
-        x_outer, y_outer, x_inner, y_inner = s[C].tile(x, y, x_factor=4, 
y_factor=y_tile_size)
-        y_inner_outer, y_inner_inner = s[C].split(y_inner, nparts=4)
+        k_outer, k_inner = s[C].split(k, factor=4)
+        x_outer, x_inner = s[C].split(x, factor=4)
+        y_outer, y_inner = s[C].split(y, factor=y_tile_size, 
disable_predication=True)

Review Comment:
   In theory it should be, as the two possible compute definitions of C in the 
hybrid implementation have the shape `(batches, M_padded, N_padded)` in the 
[scalable](https://github.com/apache/tvm/pull/16899/files#diff-42b1313a1be464c7f2c94f75d656be725f1ccb54b9391cd5b27c33009ac0e2d5R332)
 and 
[non-scalable](https://github.com/apache/tvm/pull/16899/files#diff-42b1313a1be464c7f2c94f75d656be725f1ccb54b9391cd5b27c33009ac0e2d5R348)
 cases.  
   
   However, you are right that in practice the padding might get removed by one 
of the tir passes. That is prevented for the scalable cases by this dummy 
padding 
[here](https://github.com/apache/tvm/pull/16899/files#diff-42b1313a1be464c7f2c94f75d656be725f1ccb54b9391cd5b27c33009ac0e2d5R341)
 and in the non-scalable fp16 case by the one 
[here](https://github.com/apache/tvm/pull/16899/files#diff-42b1313a1be464c7f2c94f75d656be725f1ccb54b9391cd5b27c33009ac0e2d5L346).
 I believe I didn't use the same trick for the non-scalable fp32 case because I 
noticed performance being better when allowing the padding to be removed in 
some cases.  
   
   What do you think about changing it to 
`disable_predication=use_scalable_vectors` in order to leave the non-scalable 
case as it was before?  
   Should we have any more guard rails than this? It might be overkill, but 
something like this also comes to mind: `disable_predication = 
analyzer.can_prove(y.dom.extent % y_tile_size  == 0)`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to