trevor-m commented on pull request #7154: URL: https://github.com/apache/tvm/pull/7154#issuecomment-758329448
I highly agree with you guys. For class-aware NMS, the [batch, num_anchors, 6] format seems very inefficient. It means all anchors need to be checked just to see if the classes match. A [batch, num_classes, num_anchors, 5] format would give us a nicely defined slice of memory where the same-class anchors are located. > TF takes a 1D tensor of scores and concats it to the boxes before performing get_valid_counts and nms. I'm not sure if the rest of the TF graph is handling the loop over batch size and classes. That's correct, TF's NMS is only for single class and single batch, so the TF graph loops over batches and classes. To do that, they use [tf.map_fn](https://www.tensorflow.org/api_docs/python/tf/map_fn) so the execution of each NMS can actually still run in parallel. However, this turns into a mess of control flow operators and TensorArrays, so Relay isn't able to do the same parallelization. This PR's graph rewrite could actually benefit TF OD models as well, but the pattern is a lot more complicated for TF. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
