masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-750438126


   @kevinthesun @zhiics I asked torchvision people about boxes with negative 
scores in https://github.com/pytorch/vision/issues/3198 and now I fully 
understand the issue. See in particular this great answer 
https://github.com/pytorch/vision/issues/3198#issuecomment-750346278
   
   My conclusion is that TVM's conversion rule for PyTorch NMS is definitely 
wrong and needs fixing. Here is my take away from the above discussion:
   
   * PyTorch detection model has two use of NMS - one in `ROIHead` and another 
in `RegionProposalNetwork`.
   * NMS scores in `ROIHead` are probability. There, they do score thresholding 
with user-chosen threshold before NMS. This NMS doesn't send boxes with 
negative scores to TVM.
   * NMS scores in `RegionProposalNetwork` correspond to "objectness" and they 
don't apply softmax or sigmoid to the output from objectness network. The 
scores are estimate of [logit function](https://en.wikipedia.org/wiki/Logit) 
and negative logit totally makes sense - it just mean probability is < 0.5. So 
a  totally reasonable box can end up having a negative score. We **shouldn't** 
arbitrarily cut negative boxes from RPN.
   
   It's highly possible that one of the reasons you didn't get the same output 
from pytorch detection models after compiling to TVM is this incorrect 
assumption we've been making about negative boxes, because RPN output in 
PyTorch/TVM are totally different - we are only considering only about half of 
them.
   
   The good news is, I found a way to reduce the number of boxes that are sent 
to NMS. See these parameters 
https://github.com/pytorch/vision/blob/master/torchvision/models/detection/mask_rcnn.py#L68-L71.
 They are something like `topk` parameter for each classes separately. The 
default is 1000, and there are five classes/levels in RPN NMS. So that explain 
why we are getting about 4500 boxes to RPN NMS. If we set the 
`rpn_post_nms_top_n_test` to 200, we will get at most 1000 boxes, 200 boxes for 
each level in the feature pyramid. That will significantly make detection model 
go faster and still consider a lot of boxes to keep accuracy high.
   
   So please take a look at the above issue in torchvision and my comment 
carefully and let's merge this ASAP. 
   


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to