On Wed, Apr 22, 2020 at 05:48:27PM +0100, Steve McIntyre wrote:
> Hi folks!


> I'm adding a CC to Steve Capper, a colleague in Arm who's our expert
> here for this kind of question. He's also a DM in Debian... :-)

Now I feel guilty about not doing enough Debian :-).

> On Tue, Apr 21, 2020 at 06:37:07PM -0400, Noah Meyerhans wrote:
> >On Sun, Apr 12, 2020 at 12:18:35PM +0200, Aurelien Jarno wrote:
> >
> >> It would also be nice to have numbers to see the impact on non-ARMv8.1
> >> CPU on real workloads. As pointed out by Florian, and if the impact is
> >> negligible, it might be a good idea to enable -moutline-atomics
> >> globally at the GCC level so that all software can benefit from it, and
> >> instead of only glibc. That could be either upstream or only in Debian,
> >> that's probably a separate discussion. Otherwise we will likely end up
> >> using this non-default GCC option on all packages that runs faster with
> >> it.
> >
> >Agreed.
> I think the -moutline-atomics is probably good to enable by default
> once we've got it (gcc 10). that's the suggestion I've heard from gcc
> folks in Arm.
> >> Also note that the mechanism allowing a safe upgrade *does* incur a 
> >> runtime overhead as every binary now has to test for the presence of
> >> /etc/ld.so.nohwcap to detect a possible upgrade of the glibc in
> >> progress. That's why we have disabled it on architecture not providing
> >> an optimized library [1].
> Oh, ick. :-/
> >Thanks for the pointer, it's interesting to see data on that.  This also
> >suggests that it might be worthwhile to investigate a better mechanism
> >for identifying the availability of hardware features.
> >
> >> > I've tested both options and found them to be acceptable on v8.1a 
> >> > (Neoverse
> >> > N1) and v8a (Cortex A72) CPUs.  I can provide bulk test run data of the
> >> > various different configuration permutations if you'd like to see 
> >> > additional
> >> > data.

That's good to hear!

> >> 
> >> As said above I think we would need more numbers on real workload to
> >> take a decision. Don't get me wrong I do not oppose on improving atomics
> >> on ARMv8.1, but I would like that we chose the best option. Also if we
> >> go with the -moutline-atomics option, I believe it rather has to be a
> >> ARM porters decision than a glibc maintainers decision (hence the Cc:).
> >
> >I'll see what I can come up with.
> >
> >Do the arm porters have any opinions on this matter?
> It's a good question, and thanks for asking! I definitely think it's
> worth doing -moutline-atomics, and I'm hoping Steve can share some
> performance numbers to help convince. :-)

We ran -moutline-atomics on a mixture of development hardware running,
IIRC some DPDK lock tests that employed C11-style atomics. As expected
there was a performance penalty, but it was order of magnitude of 1%.
The perf boost from moving to LSE was a lot larger (and we noticed the
variance dropping a lot with LSE too).

FWIW, I'd recommend the -moutline-atomics for the general case. (I used
to be a fan of the multi-lib approach; but the way the runtime selection
is implemented in gcc with a direct branch changed my mind :-) ).


Reply via email to