comaniac opened a new issue #7730:
URL: https://github.com/apache/tvm/issues/7730


   The PR #7348 before batch_matmul because batch_matmul already supported 
implicitly broadcast. However, the CuBLAS implementation doesn't change 
accordingly, which results in the failure of the following case:
   
   ```python
   import numpy as np
   
   import tvm
   from tvm import relay
   from tvm.contrib import graph_runtime
   
   sa = (4, 128, 768)
   sb = (1, 768, 768)
   
   a = relay.var("a", shape=sa)
   b = relay.var("b", shape=sb)
   c = relay.nn.batch_matmul(a, b)
   f = relay.Function([a, b], c)
   mod = tvm.ir.IRModule.from_expr(f)
   mod = relay.transform.InferType()(mod)
   
   with tvm.transform.PassContext(opt_level=3):
       lib = relay.build(mod, target="cuda") # change target to "cuda 
-libs=cublas" will fail
   
   ctx = tvm.gpu(0)
   m = graph_runtime.GraphModule(lib["default"](ctx))
   p = np.random.uniform(0, 1, sa)
   q = np.random.uniform(0 ,1, sb)
   m.set_input("a", p)
   m.set_input("b", q)
   
   ftimer = m.module.time_evaluator("run", ctx, number=1, repeat=10)
   prof_res = np.array(ftimer().results) * 1000
   print(np.mean(prof_res))
   ```
   
   cc @masahi @jwfromm 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to