csullivan opened a new pull request, #16842:
URL: https://github.com/apache/tvm/pull/16842

   * Use CCL type traits to share common code between NCCL and MSCCLPP API 
invocations in disco
   * Add bench to validate results and compare various supported CCL approaches 
for cuda. 
   
   Aggregated profiling results over the sweep of transfer sizes introduced in 
the above mentioned bench 2**(12 -> 24). 
   ```
    Time (%)  Total Time (ns)  Instances  Avg (ns)  Med (ns)  Min (ns)  Max 
(ns)  StdDev (ns)                                                  Name
    --------  ---------------  ---------  --------  --------  --------  
--------  -----------  
----------------------------------------------------------------------------------------------------
        33.2        277809887       4048   68628.9   45504.0     15104  
25069360     545729.8  ncclDevKernel_AllReduce_Sum_f16_RING_LL(ncclDevComm *, 
unsigned long, ncclWork *)
        24.3        203015590       4040   50251.4   48256.0     12992    
315934      29470.2  void tensorrt_llm::twoShotAllReduceKernel<__half, 
(int)8>(tensorrt_llm::AllReduceParams)
        20.9        174549284       4040   43205.3   39440.0      3711   
1085846      52387.0  void tensorrt_llm::oneShotAllReduceKernel<__half, 
(int)8>(tensorrt_llm::AllReduceParams)
        20.7        173275472       4040   42890.0   40112.0      5375    
750653      42864.2  void 
tvm::runtime::allreduce_simple<__half>(mscclpp::SmChannelDeviceHandle *, const 
T1 *, T1 *, voi…
         0.8          6985121        120   58209.3   55871.5      9695    
158239      35854.8  ncclDevKernel_AllGather_RING_LL(ncclDevComm *, unsigned 
long, ncclWork *)
   ```
   
   I noted significant variance between runs, so e2e or use of cuda graph 
launch for synchronization could help give a clearer picture.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to