Lunderberg commented on pull request #8196: URL: https://github.com/apache/tvm/pull/8196#issuecomment-857140287
Running some performance tests, it looks like the refactor has very little impact on the overall runtime. The plots below show the Q1/median/Q3 runtimes for different low-level tasks that would need to access the thread-specific resources. The only significant difference is for the copying data to the device, which is slightly higher for very small buffer copies.  Benchmarking details: Used `pytest-benchmark`, mostly with default settings. Number of iterations chosen based on runtime of the first iteration such that each data point is collected in ~1 second. Repeated initialization was allowed to use up to 10 seconds per data point. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
