Laurawly opened a new pull request #5339: [TOPI] Improve get_valid_count and 
nms performance for CUDA
URL: https://github.com/apache/incubator-tvm/pull/5339
 
 
   In this PR, I update object detection ops: `get_valid_count` and `nms` by 
removing data arrangement on the GPU. For `get_valid_count`, there are two 
computations: counting the number of valid elements; rearanging the valid 
elements to the front of the array and marking those invalid elements as -1. By 
removing the data rearrangement computation, and moving it to nms's argsort, we 
get a speedup of 266x for `get_valid_count` op in `ssd_resnet50_v1` model. I 
also remove data rearrangement for `nms` on the GPU. Though getting extra work 
for `argsort` in `nms`, we get another 7x speedup by removing the unessarry 
data rearrangment on the GPU. Note that with `get_valid_count` changed, the old 
topi/relay tests won't work. But the end-to-end object detection accuracy 
doesn't drop. Here's a performance comparison table for these two ops in 
`ssd_resnet50_v1` with input size (1, 3, 512, 512) with Thrust turned on in 
build.
    | Operator | Time (ms) w/o this PR | Time (ms) w/ this PR | Speedup |
   | --- | --- | --- | --- |
   |`get_valid_count` | 10631.1 | 38.325 | 266 |
   |`non_maximum_supression` | 6060.23 | 852.702 | 7 |
   
   Also, in this PR, I remove a minor bug in deformable conv2d.
   @icemelon9 @kevinthesun @vinx13 please review.
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

Reply via email to