[GitHub] [tvm] mbrookhart edited a comment on pull request #7123: Parallelize cumsum in get_valid_counts

GitBox Thu, 07 Jan 2021 08:55:39 -0800


mbrookhart edited a comment on pull request #7123:
URL: https://github.com/apache/tvm/pull/7123#issuecomment-756239725

Looking at the code, assuming you have thrust enabled, this should be
kernel0:

https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L798-L811
the thrust argsort wont get a number:

https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L818-L820
And this should be 1:

https://github.com/apache/tvm/blob/9815ae2d9e17eece1a1009eb6436c80f931c734e/python/tvm/topi/cuda/nms.py#L543-L579

That could have threads `(1,1,1),(1024,1,1)` if we have batch_size=1 and
num_anchors <= 1024. I'm not seeing anything in there that jumps out as having
an issue though. Every use of j is gaurded by and if scope with j<num_anchors,
j< nkeep, or j< valid_count, and nkeep <= valid_count. The only way it could
fail is if valid_count > num_anchors...

So possibly it's failing because my changes to get_valid_count are returning
the wrong valid_count.

@trevor-m any chance we can dump the inputs/attrs for get_valid_count so I
can make a unit test to check that hypothesis? I haven't been able to get it to
fail with random inputs, but possibly there's an edge case in my exclusive_scan
algorithm for this input data.

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] mbrookhart edited a comment on pull request #7123: Parallelize cumsum in get_valid_counts

Reply via email to