kevinthesun commented on a change in pull request #7117:
URL: https://github.com/apache/tvm/pull/7117#discussion_r544558368
##########
File path: python/tvm/topi/cuda/nms.py
##########
@@ -754,7 +782,22 @@ def non_max_suppression(
)
score_axis = score_index
score_shape = (batch_size, num_anchors)
- score_tensor = te.compute(score_shape, lambda i, j: data[i, j,
score_axis], tag=tag.ELEMWISE)
+ data_buf = tvm.tir.decl_buffer(data.shape, data.dtype, "data_buf",
data_alignment=8)
Review comment:
When the nms workload is large like in RCNN models, general cuda
injective schedule can still cause runtime error even with the improvement of
this PR. It's common that any dynamic injective op can have runtime issue with
current uniform cuda injective schedule.
This problem is not directly related to nms, but cuda injective schedule.
Later we might need to revisit this part for gpu dynamic ops and have a better
and more general solution(together with more tests).
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]