mbrookhart edited a comment on pull request #7123: URL: https://github.com/apache/tvm/pull/7123#issuecomment-756239725
Looking at the code, assuming you have thrust enabled, this should be kernel0: https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L798-L811 the thrust argsort wont get a number: https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L818-L820 And this should be 1: https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L543-L579 That could have threads `(1,1,1),(1024,1,1)` if we have batch_size=1 and num_anchors <= 1024. I'm not seeing anything in there that jumps out as having an issue though. Every use of j is gaurded by and if scope with j<num_anchors, j< nkeep, or j< valid_count, and nkeep <= valid_count. The only way it could fail is if valid_count > num_anchors... So possibly it's failing because my changes to get_valid_count are returning the wrong valid_count. @trevor-m any chance we can dump the inputs/attrs for get_valid_count so I can make a unit test to check that hypothesis? I haven't been able to get it to fail with random inputs, but possibly there's an edge case in my exclusive_scan algorithm for this input data. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
