jcf94 opened a new pull request #9650:
URL: https://github.com/apache/tvm/pull/9650
Recently I've got performance issue with a simple case, test script as shown:
```python
import numpy as np
import tvm
from tvm import testing
from tvm import relay
from tvm.runtime import profiler_vm
def zeros_graph():
a = relay.var("shape", shape=[5], dtype="int32")
b = relay.zeros(a, dtype="float32")
func = relay.Function([a], b)
mod = tvm.IRModule()
mod["main"] = func
return mod
target = tvm.target.Target("cuda")
dev = tvm.cuda(0)
a_data = np.array([1, 320, 168, 10, 24], dtype="int32")
b_data = np.zeros(a_data, dtype="float32")
a_tvm = tvm.nd.array(a_data, dev)
b_tvm = tvm.nd.empty(b_data.shape, b_data.dtype, dev)
def test_speed_and_check_vm(mod, target, buffers):
with tvm.transform.PassContext(opt_level=3):
lib = relay.vm.compile(mod, target=target)
run_mod = profiler_vm.VirtualMachineProfiler(lib, dev)
run_mod.run(*buffers)
b_tvm = run_mod.get_output(0)
evaluator = run_mod.module.time_evaluator("invoke", dev,
min_repeat_ms=500)
costs = np.median(evaluator("main").results)
print("Execution time of this operator: %.3f ms" % (costs * 1e3))
testing.assert_allclose(b_data, b_tvm.numpy())
print(run_mod.profile())
mod = zeros_graph()
test_speed_and_check_vm(mod, target, [a_tvm])
```
In my A100 gpu, this simple zeros costs 5 ms:
```bash
Execution time of this operator: 5.308 ms
Name Duration (us) Percent Device Hash
Argument Shapes Count
fused_dyn_zeros 5315.64 49.64 cuda0 3e3cf66c3e738993
int32[5], float32[1, 320, 168, 10, 24] 1
shape_func_dyn_zeros 3.13 0.03 cpu0 3e3cf66c3e738993
int32[5], int64[5] 1
fused_prod 0.68 0.01 cpu0 64ff7b71305dadd2
int64[5], int64[] 1
fused_multiply 0.52 0.00 cpu0 e2e2680f0ff08f46
int64[], int64[] 1
----------
Sum 5319.97 49.68
4
Total 5455.01 cuda0
1
Total 10709.44 cpu0
1
```
Finaly I find out that there should be an addition patch to
https://github.com/apache/tvm/pull/8555 .The shape of `te.var` seems to stop
some expression simplification rules.
After the modification of `te.var` to `te.size_var`, we can get:
```bash
Execution time of this operator: 0.335 ms
Name Duration (us) Percent Device Hash
Argument Shapes Count
fused_dyn_zeros 328.92 43.66 cuda0 3e3cf66c3e738993
int32[5], float32[1, 320, 168, 10, 24] 1
shape_func_dyn_zeros 4.61 0.61 cpu0 3e3cf66c3e738993
int32[5], int64[5] 1
fused_prod 1.30 0.17 cpu0 64ff7b71305dadd2
int64[5], int64[] 1
fused_multiply 1.11 0.15 cpu0 e2e2680f0ff08f46
int64[], int64[] 1
----------
Sum 335.95 44.59
4
Total 520.00 cuda0
1
Total 753.41 cpu0
1
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]