mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742884803


   I found a combination of serial and threaded portions of NMS that combine to 
make it fast while still passing the tests:
   ```
   fused_vision_get_valid_counts                                                
                           3674.62    10.87    (1, 122640, 6)
   fused_vision_non_max_suppression                                             
                           402.791    1.191    (1, 122640, 6)
   ```
   I'll go back to parallelizing the sum/scan on get_valid_counts tomorrow, but 
at this point, this is ~30x faster than main.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to