On 2020-04-10 13:16, Noah Meyerhans wrote:
> Package: src:glibc
> Version: 2.30-4
> Severity: wishlist
> X-Debbugs-CC: debian-...@lists.debian.org
> The ARMv8.1 spec, as implemented by the ARM Neoverse N1 processor,
> introduces a set of instructions [1] that result in significant performance
> improvements for multithreaded applications.  Sample code demonstrating the
> performance improvements is attached.  When run on a 16-core Neoverse N1
> host with glibc 2.30-4, runtimes vary significantly, ranging from lows
> around 250ms to highs around 15 seconds.  When linked against glibc rebuilt
> with support for these instructions, runtimes are consistently <50ms.

This is an impressive improvement!

> Significant performance impact has also been observed in less contrived
> cases (MariaDB and Postgres), but I don't have a repro to share.

But indeed what counts is number on real workloads. It would be nice to
get numbers when those software are run against a rebuilt glibc. As
those software are using a lot of atomics directly, it would be also
interesting to have numbers with those software also rebuilt to use
those new instructions.

> Gcc provides two ways to enable support for these instructions at build
> time.  The simplest, and least disruptive, is to enable -moutline-atomics
> globally in the arm64 glibc build.  As described at [2], this option enables
> runtime checks for the availability of the atomic instructions.  If found,
> they are used, otherwise ARMv8.0 compatible code is used.  The drawback of
> this option is that the check happens at runtime, thus introducing some
> overhead on all arm64 installations.

It would also be nice to have numbers to see the impact on non-ARMv8.1
CPU on real workloads. As pointed out by Florian, and if the impact is
negligible, it might be a good idea to enable -moutline-atomics
globally at the GCC level so that all software can benefit from it, and
instead of only glibc. That could be either upstream or only in Debian,
that's probably a separate discussion. Otherwise we will likely end up
using this non-default GCC option on all packages that runs faster with

> The second option is to provide libraries built with explicit support for
> the ARM v8.1a spec via the -march=armv8.1-a flag.  This option is also
> described at [2].  This build would be incompatible with earlier versions of
> the spec, so it would need to be provided in a location where the linker
> will automatically discover it if it is usable (e.g.
> /lib/aarch64-linux-gnu/atomics/).  This does not incur any runtime overhead,
> but obviously involves an additional libc build, and the corresponding
> complixity and disk space utilization.  I'm not sure if this is an option
> that the glibc maintainers are interested in pursuing.

To be honest from a glibc maintenance point of view it's something I
would like to avoid. We haven't been actively trying to remove the
remaining optimized libraries (on i386, hurd and alpha), but we have
tried to avoid adding new ones. The problem is not building a second
optimized glibc, but rather providing a safe upgrade as the optimized
and the non-optimized package have to be at the same version or one of
them has to be disabled. This has caused many system breakages overall.

Also note that the mechanism allowing a safe upgrade *does* incur a 
runtime overhead as every binary now has to test for the presence of
/etc/ld.so.nohwcap to detect a possible upgrade of the glibc in
progress. That's why we have disabled it on architecture not providing
an optimized library [1].

> I've tested both options and found them to be acceptable on v8.1a (Neoverse
> N1) and v8a (Cortex A72) CPUs.  I can provide bulk test run data of the
> various different configuration permutations if you'd like to see additional
> data.

As said above I think we would need more numbers on real workload to
take a decision. Don't get me wrong I do not oppose on improving atomics
on ARMv8.1, but I would like that we chose the best option. Also if we
go with the -moutline-atomics option, I believe it rather has to be a
ARM porters decision than a glibc maintainers decision (hence the Cc:).

> I can provide patches or merge requests implementing either option, at least
> for a starting point, if you'd like to see them.

Thanks for this offer, but I don't think that's the most difficult part,
it's fairly straightforward to go for either of those options once a
decision is taken.


[1] https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=908928

Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurel...@aurel32.net                 http://www.aurel32.net

Reply via email to