masahi commented on pull request #7137: URL: https://github.com/apache/tvm/pull/7137#issuecomment-751300057
> One interesting thing about output result is: for pytorch 1.7, we can exact match the results of tvm vs pt with this change, but for pytorch 1.4 there is still mismatch which won't affect final accuracy Interesting, I didn't test on 1.4. If even the number of output box are different, then it must be a NMS issue, since NMS is the only thing that could change the number of box. I'm also fine if we can get the same output as the latest pytorch. Thanks for validating my fix. Luckily, I also found a way to parallelize the inner loop of GPU NMS which should give a massive speedup. The change is this one https://github.com/masahi/tvm/commit/c75c6ef00b1e89d6a6698aeb8048be6e6f13c0ff but since I'm away from my GPU at the moment, I haven't tested yet. It also reduces the number of IOU tests to O(N ** 2) to O (# selected boxes * N). Hopefully next week I can send a PR with good perf improvement on GPU NMS and hence PyTorch MaskRCNN performance on GPU. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
