[GitHub] [tvm] masahi commented on pull request #7233: [TOPI] Minor perf improvement for GPU scatter

GitBox Fri, 08 Jan 2021 13:55:50 -0800


masahi commented on pull request #7233:
URL: https://github.com/apache/tvm/pull/7233#issuecomment-757016060



   Yes, there are 4 calls to 4D scatter in MaskRCNN, the old kernel was taking 
11.6 milli seconds on them in total, making it one of the bottlenecks as shown 
in the profile above. This change brings it down to 1.9873 milli seconds total 
and it is no longer a bottleneck. So this is a solid improvement. 
   
   I think the reason the old kernel was slow for this input (1000, 256, 7, 7) 
is because thread block is too small (32, 1, 1) and we are launching too many 
of them (1000 * 256 * 7 blocks). 


----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]

[GitHub] [tvm] masahi commented on pull request #7233: [TOPI] Minor perf improvement for GPU scatter

Reply via email to