adfwer233 commented on code in PR #15389:
URL: https://github.com/apache/tvm/pull/15389#discussion_r1272934493
##########
python/tvm/dlight/gpu/matmul.py:
##########
@@ -290,6 +507,16 @@ def is_spatial(block: BlockRV) -> bool:
return None
matmul_index_map, a_index_map, b_index_map, c_index_map = index_maps
+ if target.kind.name == "cuda" and check_sm_version(target.arch) > 70:
+ apply_tensorization: bool = True
+ for item_var in block_stmt.iter_vars:
+ extent = item_var.dom.extent
+ if isinstance(extent, tir.expr.IntImm):
+ if extent.value > 1 and extent.value <= 128:
Review Comment:
In my observation, this tensorization rule underperforms the original dlight
matmul rule with small size, so I choose 128 as a threshold for tensorization.
A few explaination will be added here and I will try to make it more elegant.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]