mbrookhart commented on pull request #7154: URL: https://github.com/apache/tvm/pull/7154#issuecomment-750376423
This is a bit of a shot in the dark. I wonder if we're memory access limited, and so that's why you don't see a performance improvement. When we do the nested loop, we always have to check if the id of instance k matches the id of instance j. Since the input shape is (batch_size, num_anchors, features), and features = 6 here, I wouldn't be surprised if checking the instance of k ends up reading all of the features of k into registers, and that memory read is the expensive operation. Once it's in memory, actually doing the iou calculation is relatively cheap, so skipping it doesn't help that much. ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
