[GitHub] [tvm] masahi edited a comment on pull request #7154: [Torch] Restore class-aware NMS for detection models by graph rewrite

GitBox Wed, 23 Dec 2020 15:52:54 -0800


masahi edited a comment on pull request #7154:
URL: https://github.com/apache/tvm/pull/7154#issuecomment-750589039



   > When we do the nested loop, we always have to check if the id of instance 
k matches the id of instance j. Since the input shape is (batch_size, 
num_anchors, features), and features = 6 here, I wouldn't be surprised if 
checking the instance of k ends up reading all of the features of k into 
registers, and that memory read is the expensive operation. Once it's in 
memory, actually doing the iou calculation is relatively cheap, so skipping it 
doesn't help that much.
   
   That's highly possible. Looking at this if condition in the triangle inner 
loop:
   
https://github.com/apache/tvm/blob/9956b5b859a24c8f4dec1abf308723b1257ffc66/python/tvm/topi/cuda/nms.py#L535-L540
   
   previously, `force_suppress` is always True, so this condition short circuit 
and access to `out[base_idx + offset_k + id_index]` and `out[base_idx + 
offset_j + id_index]` just below never happen. But now, to make NMS class 
aware, I had to change `force_suppress` to False, and now access to 
`out[base_idx + offset_k + id_index]` and `out[base_idx + offset_j + id_index]` 
always happen. This may be cancelling the speedup from reduced IOU tests. 
Storing the class IDs in a different 1D tensor may help. 
   
   That brings me to my pain point with our NMS API: I belieave our NMS API 
needs to be reworked. The current way of packing class ids and scores together 
with bbox coordinates is a design mistake that we inherited from MXNet. To 
store class ids, I have to cast ids to float32, update and pass `id_index` 
appropriately. Since our NMS API also requires scores to be packed with bbox, I 
had to update `score_index` too and all frontends except MXNet needs to do this 
concatenation. The worst part is that in NMS IR, the very first thing we do is 
the extraction 1D score tensor from packed data. So I see no good reason to 
pack score tensor and bbox coordinates.
                               


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi edited a comment on pull request #7154: [Torch] Restore class-aware NMS for detection models by graph rewrite

Reply via email to