Lunderberg commented on pull request #8196:
URL: https://github.com/apache/tvm/pull/8196#issuecomment-855005511


   Thank you @tqchen , and that makes sense with the potential overhead issues. 
 I had been assuming that the main performance cost would be in the 
`std::unordered_map` lookup.  I'd expect that lookup to be roughly equivalent, 
since while the local map requires a lookup by `std::thread::id`, the TLS 
requires a lookup by `device_id`.  Having two threads block should be rare, 
since the locks are acquired non-exclusively whenever the thread's object 
already exists, but you are right that I didn't consider the cost of acquiring 
that shared lock in the first place.
   
   I'll see if I can measure the performance difference, and will also see 
whether I can refactor to maintain the ownership model of everything deriving 
from the `VulkanDeviceAPI`, while keeping the minimal overhead of TLS.
   
   Regarding the non-public apis, that makes sense.  It wasn't vulkan-specific, 
and so I didn't want it all the way in `src/runtime/vulkan`, but hadn't 
considered that impact of adding to the public api.  I will move `thread_map.h` 
(or whatever thread_local variant replaces it) to be in `src/runtime` instead.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]


Reply via email to