On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote: > > Significant performance impact has also been observed in less contrived > > cases (MariaDB and Postgres), but I don't have a repro to share. > > But indeed what counts is number on real workloads. It would be nice to > get numbers when those software are run against a rebuilt glibc. As > those software are using a lot of atomics directly, it would be also > interesting to have numbers with those software also rebuilt to use > those new instructions.
Agreed. I don't have specific examples of real world impact at the moment. AIUI, the most significant impact comes in the usage of atomics in pthread_mutex_lock(). When there are multiple threads contending for a lock, one thread will (approximately) always obtain the lock, while the others will starve. With atomics support in place, the probability of obtaining the lock is roughly evenly distributed among all the threads. So any workload in which multiple threads may contend for a lock should be a candidate to demonstrate this problem in the real world. > It would also be nice to have numbers to see the impact on non-ARMv8.1 > CPU on real workloads. As pointed out by Florian, and if the impact is > negligible, it might be a good idea to enable -moutline-atomics > globally at the GCC level so that all software can benefit from it, and > instead of only glibc. That could be either upstream or only in Debian, > that's probably a separate discussion. Otherwise we will likely end up > using this non-default GCC option on all packages that runs faster with > it. Agreed. > To be honest from a glibc maintenance point of view it's something I > would like to avoid. We haven't been actively trying to remove the > remaining optimized libraries (on i386, hurd and alpha), but we have > tried to avoid adding new ones. The problem is not building a second > optimized glibc, but rather providing a safe upgrade as the optimized > and the non-optimized package have to be at the same version or one of > them has to be disabled. This has caused many system breakages overall. Understood, that makes sense. I wonder if it's worth it to investigate techniques to improve the situation around optimized libraries. Do you have any thoughts on what such an improvement might look like? > Also note that the mechanism allowing a safe upgrade *does* incur a > runtime overhead as every binary now has to test for the presence of > /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in > progress. That's why we have disabled it on architecture not providing > an optimized library . Thanks for the pointer, it's interesting to see data on that. This also suggests that it might be worthwhile to investigate a better mechanism for identifying the availability of hardware features. > > I've tested both options and found them to be acceptable on v8.1a (Neoverse > > N1) and v8a (Cortex A72) CPUs. I can provide bulk test run data of the > > various different configuration permutations if you'd like to see additional > > data. > > As said above I think we would need more numbers on real workload to > take a decision. Don't get me wrong I do not oppose on improving atomics > on ARMv8.1, but I would like that we chose the best option. Also if we > go with the -moutline-atomics option, I believe it rather has to be a > ARM porters decision than a glibc maintainers decision (hence the Cc:). I'll see what I can come up with. Do the arm porters have any opinions on this matter? noah