trevor-m commented on pull request #7154:
URL: https://github.com/apache/tvm/pull/7154#issuecomment-758329448


   I highly agree with you guys.
   
   For class-aware NMS, the [batch, num_anchors, 6] format seems very 
inefficient. It means all anchors need to be checked just to see if the classes 
match. A [batch, num_classes, num_anchors, 5] format would give us a nicely 
defined slice of memory where the same-class anchors are located.
   
   > TF takes a 1D tensor of scores and concats it to the boxes before 
performing get_valid_counts and nms. I'm not sure if the rest of the TF graph 
is handling the loop over batch size and classes.
   
   That's correct, TF's NMS is only for single class and single batch, so the 
TF graph loops over batches and classes. To do that, they use 
[tf.map_fn](https://www.tensorflow.org/api_docs/python/tf/map_fn) so the 
execution of each NMS can actually still run in parallel. However, this turns 
into a mess of control flow operators and TensorArrays, so Relay isn't able to 
do the same parallelization. This PR's graph rewrite could actually benefit TF 
OD models as well, but the pattern is a lot more complicated for TF.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to