mbrookhart opened a new pull request #6839: URL: https://github.com/apache/incubator-tvm/pull/6839
This PR adds ONNX support for NonMaxSuppression. The ONNX API is a little odd, it roughly follows the TF op for combined_non_max_suppression. To implement it in tvm, I decided to go with relay while loops instead of writing a new TOPI kernel. Getting it to work on CUDA was a pain, first, I needed to change a couple of input values from attributes to parameters, then the cuda kernels were out of spec and skipping tests. Implementing those tests and passing them required refactoring some of the kernels to bring them in spec, and then removing cuda threads in several places to get the CUDA kernel to return the correct results. I labeled this WIP because I'd like to spend more time trying to figure out how to speed up the cuda kernels, but I'd love ideas from anyone who's interested in looking at it. @jroesch @jwfromm @zhiics @tkonolige @csullivan Because you've touched the cuda nms file, could you take a look @Laurawly @yongwww @kevinthesun ? Thanks, Matthew ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
