mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742658875


   @Laurawly I took the mxnet example you provided and ran it with the debug 
runtime. It required a little bit of editing, APIs have changed slightly since 
that tutorial was written. Anyway, this is what I get on my 1070 TI with Thrust 
enabled.
   
   main:
   ```
   Ops                                                                          
                           Time(us)    Time(%)  Shape
   ---                                                                          
                           --------    -------  ----- 
   fused_vision_non_max_suppression                                             
                           139329.0    74.66    (1, 122640, 6)
   fused_vision_get_valid_counts                                                
                           124.255     0.067    (1, 122640, 6)     
   ```
   this PR:
   ```   
   fused_vision_get_valid_counts                                                
                           46138.3    50.891   (1, 122640, 6)  
   fused_vision_non_max_suppression                                             
                           12319.8    13.589   (1, 122640, 6)
   ```
   
   The get valid counts function slow down, but I'm actually seeing the total 
runtime of these ops decrease from 139.3ms to 58.5ms
   
   My modifications to the example can be found here: 
https://gist.github.com/mbrookhart/df25427cbbfb3c73ed16be72c8525610


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to