Lunderberg commented on pull request #8196: URL: https://github.com/apache/tvm/pull/8196#issuecomment-855005511
Thank you @tqchen , and that makes sense with the potential overhead issues. I had been assuming that the main performance cost would be in the `std::unordered_map` lookup. I'd expect that lookup to be roughly equivalent, since while the local map requires a lookup by `std::thread::id`, the TLS requires a lookup by `device_id`. Having two threads block should be rare, since the locks are acquired non-exclusively whenever the thread's object already exists, but you are right that I didn't consider the cost of acquiring that shared lock in the first place. I'll see if I can measure the performance difference, and will also see whether I can refactor to maintain the ownership model of everything deriving from the `VulkanDeviceAPI`, while keeping the minimal overhead of TLS. Regarding the non-public apis, that makes sense. It wasn't vulkan-specific, and so I didn't want it all the way in `src/runtime/vulkan`, but hadn't considered that impact of adding to the public api. I will move `thread_map.h` (or whatever thread_local variant replaces it) to be in `src/runtime` instead. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected]
