[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores


masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750960526

ok I ran CPU perf comparison with old code vs new code + default param. As
expected, there is almost no perf regression.

```
Old code: 2.1364135856199997 sec
After the fix in this PR: 2.1756499899999997 sec
```
The old code spends 5 milliseconds on NMS. Here is the stat from profile.

https://github.com/masahi/torchscript-to-tvm/blob/master/maskrcnn/maskrcnn_cpu_vm_profile_old.txt#L51

The new code gets far more boxes, but NMS spends only 20 milliseconds on it.

https://github.com/masahi/torchscript-to-tvm/blob/master/maskrcnn/maskrcnn_cpu_vm_profile_1000.txt#L24

The reason the new code doesn't make CPU perf worse is simple. Even though
now NMS gets more boxes, PyTorch detection models does post NMS topk after RPN,
see
https://github.com/pytorch/vision/blob/90645ccd0e774ad76200245e32222a23d09f2312/torchvision/models/detection/rpn.py#L261-L263

So no matter how many boxes NMS gets, the number of boxes after NMS + post
NMS topk doesn't change and bounded by `rpn_pre_nms_top_n_test` parameter whose
default is 1000. The CPU perf is always dominated by the large dense layer.

Moreover, I'd like to reiterate that the new code is more correct and gives
essentially the same output as PyTorch. At this point, if you are still not
happy with the this fix, I'm genuinely curious why.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi edited a comment on pull request #7137: [Torch] Fix PyTorch NMS conversion for negative scores

Reply via email to