mbrookhart commented on a change in pull request #7117:
URL: https://github.com/apache/tvm/pull/7117#discussion_r544451189



##########
File path: python/tvm/topi/cuda/nms.py
##########
@@ -95,23 +94,23 @@ def rearrange_indices_out_ir(data, output, valid_box_count):
     with ib.new_scope():
         i = te.thread_axis("blockIdx.x")
         ib.scope_attr(i, "thread_extent", batch_size)
-        valid_idx = ib.allocate("int32", (1), name="valid_idx", scope="local")
-        valid_idx[0] = 0
+        valid_idx = ib.allocate("int32", (batch_size,), name="valid_idx", 
scope="local")

Review comment:
       We can't allocate something with dynamic shapes, this is why this test 
is failing. I'm not sure I understand why this change is needed. Since we're 
threading over batch size, a thread-local variable of size 1 will effectively 
be a scratch pad of size batch_size distributed over the threads.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to