masahi commented on PR #15288:
URL: https://github.com/apache/tvm/pull/15288#issuecomment-1630418632

   ok, here is the nvprof output on the shape (1, 1, 4096). 
   
   CUTLASS, using the test case in this PR:
   ```
    Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max 
(ns)  StdDev (ns)                                         Name                  
                      
    --------  ---------------  ---------  --------  --------  --------  
--------  -----------  
-----------------------------------------------------------------------------------
       100.0            2,176          1   2,176.0   2,176.0     2,176     
2,176          0.0  cutlass::rmsnorm_twoPassAlgo_e8(float4 *, const float4 *, 
const float4 *, int, int)
   ```
   
   dlight, using this script 
https://gist.github.com/masahi/cee92512f8953275158c87656cfee22a
   ```
    Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max 
(ns)  StdDev (ns)        Name      
    --------  ---------------  ---------  --------  --------  --------  
--------  -----------  ----------------
       100.0            3,552          1   3,552.0   3,552.0     3,552     
3,552          0.0  rms_norm1_kernel
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to