comaniac opened a new issue #7730:
URL: https://github.com/apache/tvm/issues/7730
The PR #7348 before batch_matmul because batch_matmul already supported
implicitly broadcast. However, the CuBLAS implementation doesn't change
accordingly, which results in the failure of the following case:
```python
import numpy as np
import tvm
from tvm import relay
from tvm.contrib import graph_runtime
sa = (4, 128, 768)
sb = (1, 768, 768)
a = relay.var("a", shape=sa)
b = relay.var("b", shape=sb)
c = relay.nn.batch_matmul(a, b)
f = relay.Function([a, b], c)
mod = tvm.ir.IRModule.from_expr(f)
mod = relay.transform.InferType()(mod)
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target="cuda") # change target to "cuda
-libs=cublas" will fail
ctx = tvm.gpu(0)
m = graph_runtime.GraphModule(lib["default"](ctx))
p = np.random.uniform(0, 1, sa)
q = np.random.uniform(0 ,1, sb)
m.set_input("a", p)
m.set_input("b", q)
ftimer = m.module.time_evaluator("run", ctx, number=1, repeat=10)
prof_res = np.array(ftimer().results) * 1000
print(np.mean(prof_res))
```
cc @masahi @jwfromm
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]