masahi commented on PR #15288:
URL: https://github.com/apache/tvm/pull/15288#issuecomment-1630418632
ok, here is the nvprof output on the shape (1, 1, 4096).
CUTLASS, using the test case in this PR:
```
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max
(ns) StdDev (ns) Name
-------- --------------- --------- -------- -------- --------
-------- -----------
-----------------------------------------------------------------------------------
100.0 2,176 1 2,176.0 2,176.0 2,176
2,176 0.0 cutlass::rmsnorm_twoPassAlgo_e8(float4 *, const float4 *,
const float4 *, int, int)
```
dlight, using this script
https://gist.github.com/masahi/cee92512f8953275158c87656cfee22a
```
Time (%) Total Time (ns) Instances Avg (ns) Med (ns) Min (ns) Max
(ns) StdDev (ns) Name
-------- --------------- --------- -------- -------- --------
-------- ----------- ----------------
100.0 3,552 1 3,552.0 3,552.0 3,552
3,552 0.0 rms_norm1_kernel
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]