https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108494
--- Comment #3 from Jakub Jelinek <jakub at gcc dot gnu.org> --- Yeah, we only cache threads at the outermost parallelism level, for nested parallelism threads are created and destructed as needed. I know libomp basically never destroys threads (except for omp_pause_resource{,_all}?), but am not convinced that is a good idea resource-wise, while it makes pointless benchmarks faster, whenever some program uses nested parallelism for a short time say from some library once and then doesn't need it anymore, it will just waste resources. Most programs don't use omp_pause_resource{,_all} and especially in libraries it is pretty impossible because the library doesn't know if some other part of the program doesn't actually use OpenMP.