https://gcc.gnu.org/bugzilla/show_bug.cgi?id=71744
--- Comment #16 from Gleb Natapov <gleb at scylladb dot com> --- Can you suggest an alternative to libgcc patch? Use other TLS model? Allocate per thread storage dynamically somehow? About lock array, I tries to use rwlock and the current one is not better than regular lock since it does lock internally. There is a patch series to improve this, but even with the improved rwlock array approach generates much better result for my test case since it avoids lock contention in most cases. The array approach can be improved/generalized by having a lock per thread, but I think array is a much simpler approach that delivers good result. About programs abusing dlopen/dlclose, while I am sure there are those, it is reasonable expectation that dlopen/dlclose on different threads will serialize them, but throwing exception, while expected to be heavy operation, is not expected to make threads to execute in a lock step. Taking some extra lock should be negligible compared to other work dlopen has to do. About memory considerations, are those numbers are so significant, especially consider the overhead each thread has in the kernel? I would be nice to continue this discussion on the mailing list. Regardless, lest apply your patch since it is essential to the final solution.