masahi edited a comment on pull request #8174:
URL: https://github.com/apache/tvm/pull/8174#issuecomment-854239897


   lol thrust is the bottleneck now? Note that our sorting performance improved 
thanks to https://github.com/apache/tvm/pull/7611, thrust is no longer a 
requirement for good performance. Also the new implementation does not call 
`get_valid_count`, which uses thrust for exclusive scan on relatively large 
input (total number of boxes, which in our case num_boxes * num_class).
   
   I see that the number of boxes is quite small in your model, I believe the 
new implementation would be much faster when the number of boxes is large. Do 
you have other models, preferably the one with more input boxes?
   
   I've been testing tf2 models from 
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md,
 their ssd mv2 model (FPNLite 320x320) has 12480 boxes,  and ssd resnet 50 v1 
has more than 50000, and efficient det2 has 110484 boxes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to