masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750960526


   ok I ran CPU perf comparison with old code vs new code + default param. As 
expected, there is almost no perf regression.
   
   ``` 
   Old code: 2.1364135856199997 sec
   After the fix in this PR: 2.1756499899999997 sec
   ```
   The old code spends 5 milliseconds on NMS. Here is the stat from profile.
   
https://github.com/masahi/torchscript-to-tvm/blob/master/maskrcnn/maskrcnn_cpu_vm_profile_old.txt#L51
   
   The new code gets far more boxes, but NMS spends only 20 milliseconds on it.
   
https://github.com/masahi/torchscript-to-tvm/blob/master/maskrcnn/maskrcnn_cpu_vm_profile_1000.txt#L24
   
   The reason the new code doesn't make CPU perf worse is simple. Even though 
now NMS gets more boxes, PyTorch detection models does post NMS topk after RPN, 
see 
https://github.com/pytorch/vision/blob/90645ccd0e774ad76200245e32222a23d09f2312/torchvision/models/detection/rpn.py#L261-L263
   
   So no matter how many boxes NMS gets, the number of boxes after NMS + post 
NMS topk doesn't change and bounded by `rpn_pre_nms_top_n_test` parameter whose 
default is 1000. The CPU perf is always dominated by the large dense layer.
   
   Moreover, I'd like to reiterate that the new code is more correct and gives 
essentially the same output as PyTorch. At this point, if you are still not 
happy with the this fix, I'm genuinely curious why.
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to