comaniac opened a new issue #7730: URL: https://github.com/apache/tvm/issues/7730
The PR #7348 before batch_matmul because batch_matmul already supported implicitly broadcast. However, the CuBLAS implementation doesn't change accordingly, which results in the failure of the following case: ```python import numpy as np import tvm from tvm import relay from tvm.contrib import graph_runtime sa = (4, 128, 768) sb = (1, 768, 768) a = relay.var("a", shape=sa) b = relay.var("b", shape=sb) c = relay.nn.batch_matmul(a, b) f = relay.Function([a, b], c) mod = tvm.ir.IRModule.from_expr(f) mod = relay.transform.InferType()(mod) with tvm.transform.PassContext(opt_level=3): lib = relay.build(mod, target="cuda") # change target to "cuda -libs=cublas" will fail ctx = tvm.gpu(0) m = graph_runtime.GraphModule(lib["default"](ctx)) p = np.random.uniform(0, 1, sa) q = np.random.uniform(0 ,1, sb) m.set_input("a", p) m.set_input("b", q) ftimer = m.module.time_evaluator("run", ctx, number=1, repeat=10) prof_res = np.array(ftimer().results) * 1000 print(np.mean(prof_res)) ``` cc @masahi @jwfromm -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org