masahi edited a comment on pull request #7137:
URL: https://github.com/apache/tvm/pull/7137#issuecomment-751300057


   > One interesting thing about output result is: for pytorch 1.7, we can 
exact match the results of tvm vs pt with this change, but for pytorch 1.4 
there is still mismatch which won't affect final accuracy
   
   Interesting, I didn't test on 1.4. If even the number of output box are 
different, then it must be a NMS issue, since NMS is the only thing that could 
change the number of box. I'm also fine if we can get the same output as the 
latest pytorch.
   
   Thanks for validating my fix. Luckily, I also found a way to parallelize the 
inner loop of GPU NMS which should give a massive speedup. The change is this 
one 
https://github.com/masahi/tvm/commit/c75c6ef00b1e89d6a6698aeb8048be6e6f13c0ff 
but since I'm away from my GPU at the moment, I haven't tested yet. It also 
reduces the number of IOU tests from O(N ** 2) to O (# selected boxes * N). 
   
   Hopefully next week I can send a PR with good perf improvement on GPU NMS 
and hence PyTorch MaskRCNN performance on GPU.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to