masahi edited a comment on pull request #7137: URL: https://github.com/apache/tvm/pull/7137#issuecomment-750733915
ok just as a one data point, when I was investigating GPU NMS performance issue, the old code was taking 630 milliseconds while with this fix it was 2.1 seconds. But again, that's because the new code is dealing with far more boxes and our GPU NMS code is currently extremely slow due to the sequential loop. According to the numbers I posted in https://github.com/apache/tvm/pull/7154, on CPU NMS is fast: the old code was spending only 8 milliseconds. So I don't expect NMS on CPU would be a big issue. After NMS, PyTorch detection model does post-NMS topk, which selects 1000 boxes for later processing. So the perf difference should only be in NMS. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
