https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81501
--- Comment #7 from Yann Droneaud <yann at droneaud dot fr> --- (In reply to Roy Jacobson from comment #6) > We recently upgraded our toolchain from GCC9 to GCC11, and we're seeing > __tls_get_addr take up to 10% of total runtime under some workloads, where > it was 1-2% before. > > It seems that some changes to the optimization passes in 10 or 11 have > significantly increased the impact of this problem. In bug #82803, comment #12, (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82803#c12), I've shown a workaround I used, which might be useful until GCC handle __tls_get_addr() as returning a constant addresses that doesn't need to be looked up multiple times in a function.