[GitHub] [tvm] masahi edited a comment on pull request #7257: [TOPI] Improve memory layout inside GPU NMS kernel

GitBox Tue, 12 Jan 2021 12:03:35 -0800


masahi edited a comment on pull request #7257:
URL: https://github.com/apache/tvm/pull/7257#issuecomment-758901561



   I don't quite follow, maybe you are missing something? 
   
   First, this PR doesn't change our NMS API, it only changes the buffer 
layouts used internally. 
   
   Second, the final concat is only required for MXNet, which uses 
`return_indices=False`. Our NMS returns something completely different 
depending on `return_indices` flag : If True, it returns a big output tensor 
packed with bboxes, scores and class ids, with invalid entries indicated by -1.
   
   (The valid entries are supposed to move to the top, if ` invalid_to_bottom` 
flag is True. But our GPU NMS kernel ignores this argument and the output is 
not reordered. This is another difference with CPU implementation, I think this 
is a bug)
   
   
https://github.com/apache/tvm/blob/4e8cc4fc26e931e38017d198d29f45cba04f5a60/python/tvm/topi/cuda/nms.py#L762-L763
   
   If `return_indices=True`, which applies to TF, ONNX, and PyTorch, we only 
return survived box indices, so there is no need to concat bboxes, scores, and 
class ids. For this case, there is zero additional overhead after this PR, it 
is just that concat before NMS done by frontends are now completely pointless.


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi edited a comment on pull request #7257: [TOPI] Improve memory layout inside GPU NMS kernel

Reply via email to