mbrookhart commented on pull request #6839:
URL: https://github.com/apache/tvm/pull/6839#issuecomment-742884803
I found a combination of serial and threaded portions of NMS that combine to
make it fast while still passing the tests:
```
fused_vision_get_valid_counts
3674.62 10.87 (1, 122640, 6)
fused_vision_non_max_suppression
402.791 1.191 (1, 122640, 6)
```
I'll go back to parallelizing the sum/scan on get_valid_counts tomorrow, but
at this point, this is ~30x faster than main.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]